bipartite_learn.matrix_factorization package#

Module contents#

class bipartite_learn.matrix_factorization.DNILMF(positive_importance=6, n_components_rows=90, n_components_cols='same', learning_rate=1.0, alpha=None, beta=None, gamma=None, lambda_rows=2, lambda_cols='same', n_neighbors=5, max_iter=100, tol=1e-05, keep_positives=False, resample_X=False, verbose=False, random_state=None)#

Bases: BaseMultipartiteSampler, RegressorMixin

Dual-Network Integrated Logistic Matrix Factorization

Note: the kernel fusion pre-processing procedure described by [1] is implemented as

:module:preprocessing.monopartite_transformers.TargetKernelDiffuser

and can be applyied together with DNILMF in a pipeline.

Parameters:
  • positive_importance (int, default=6) – The multiplier factor to apply to positive (known) interactions. Each positive interaction (y == 1) will weight positive_importance times more than a negative, as if we have positive_importance times more occurences of positive labels in the dataset than we actually have, that is, as if each positive instance was repeated (oversampled) positive_importance times. Called c in the original paper [1].

  • n_components_rows (int, default=90) – Number of components of X[0] latent vectors, the number of columns of U.

  • n_components_cols (int or "same", default="same") – Number of components of X[0] latent vectors, the number of columns of U. If “same”, it takes the same value of n_components_rows

  • learning_rate (float or sequence of floats, default=1.0) – Multiplicative factor for each gradient step.

  • alpha (float or None, default=None) – Constant that multiplies the y matrix when computing the loss function. The greater it is, the more supervised is the algorithm. If None will be substituted by 1 - beta - gamma.

  • beta (float or None, default=None) – Constant that multiplies the row similarity matrix when computing the loss function. Thus, it controls the importance given by the algorithm to the rows’s unsupervised information. If None will be substituted by 1 - alpha - gamma.

  • gamma (float or None, default=None) – Constant that multiplies the column similarity matrix when computing the loss function. Thus, it controls the importance given by the algorithm to the column’s unsupervised information. If None will be substituted by 1 - alpha - beta.

  • lambda_rows (float, default=0.625) – Corresponds to the inverse of the assumed prior variance of U. It multiplies the regularization term of U.

  • lambda_cols (float or "same", default="same") – Corresponds to the inverse of the assumed prior variance of V. It multiplies the regularization term of V. If “same”, it takes the same value of lambda_rows.

  • n_neighbors (int, default=5) – Number of nearest neighbors to consider when predicting new samples.

  • max_iter (int, default=100) – Maximum number of iterations.

  • tol (float, default=1e-5) – Minimum relative loss improvement to continue iteration.

  • keep_positives (bool, default=False) – If True, it keeps 1s from the original y in the transformed y. Note that it does not apply when calling only predict(), so that fit_predict() will no longer yield the same result as fit().predict().

  • resample_X (bool, default=False) – If True, return [U, V] as resampled X in fit_resample.

  • verbose (bool, default=False) – Wether to display or not training status information.

  • random_state (int, RandomState instance or None, default=None) – Used for initialisation of U and V. Pass an int for reproducible results across multiple function calls. See Glossary.

References

Hao, M., Bryant, S. & Wang, Y. Sci Rep 7, 40376 (2017).

Yong Liu, Min Wu, Chunyan Miao, Peilin Zhao, Xiao-Li Li, (2016)

fit(X, y)#

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:
  • X ({array-like, dataframe, sparse matrix} of shape (n_samples, n_features)) – Data array.

  • y (array-like of shape (n_samples,)) – Target array.

Returns:

self – Return the instance itself.

Return type:

object

fit_predict(X, y)#
predict(X)#
class bipartite_learn.matrix_factorization.NRLMF(positive_importance=5, n_components_rows=10, n_components_cols='same', alpha_rows=0.1, alpha_cols='same', lambda_rows=0.625, lambda_cols='same', n_neighbors=5, learning_rate=1.0, max_iter=100, tol=1e-05, keep_positives=False, resample_X=False, verbose=False, random_state=None)#

Bases: BaseMultipartiteSampler, RegressorMixin

Neighborhood Regularized Logistic Matrix Factorization.

[1] Yong Liu, Min Wu, Chunyan Miao, Peilin Zhao, Xiao-Li Li, “Neighborhood Regularized Logistic Matrix Factorization for Drug-target Interaction Prediction” DOI: 10.1371/journal.pcbi.1004760

Parameters:
  • positive_importance (int, default=5) – The multiplier factor to apply to positive (known) interactions. Each positive interaction (y == 1) will weight positive_importance times more than a negative, as if we have positive_importance times more occurences of positive labels in the dataset than we actually have, that is, as if each positive instance was repeated (oversampled) positive_importance times. Called c in the original paper [1].

  • n_components_rows (int, default=10) – Number of components of X[0] latent vectors, the number of columns of U.

  • n_components_cols (int or "same", default="same") – Number of components of X[0] latent vectors, the number of columns of U. If “same”, it takes the same value of n_components_rows

  • alpha_rows (float, default=1.0) – Constant that multiplies the local similarity matrix of row instances, weighting their neighborhood information when calculating the loss.

  • alpha_cols (float or "same", default="same") – Constant that multiplies the local similarity matrix of column instances, weighting their neighborhood information when calculating the loss. Originally called \(\beta\) by [1]. If “same”, it takes the same value of alpha_rows.

  • lambda_rows (float, default=0.625) – Corresponds to the inverse of the assumed prior variance of U. It multiplies the regularization term of U.

  • lambda_cols (float or "same", default="same") – Corresponds to the inverse of the assumed prior variance of V. It multiplies the regularization term of V. If “same”, it takes the same value of lambda_rows.

  • n_neighbors (int, default=5) – Number of nearest neighbors to consider when predicting new samples and building the local similarity (laplacian) matrices.

  • learning_rate (float, default=1.0) – Multiplicative factor for each gradient step.

  • max_iter (int, default=100) – Maximum number of iterations.

  • tol (float, default=1e-5) – Minimum relative loss improvement to continue iteration.

  • keep_positives (bool, default=False) – If True, it keeps 1s from the original y in the transformed y. Note that it does not apply when calling only predict(), so that fit_predict() will no longer yield the same result as fit().predict().

  • resample_X (bool, default=False) – If True, return [U, V] as resampled X in fit_resample.

  • verbose (bool, default=False) – Wether to display or not training status information.

  • random_state (int, RandomState instance or None, default=None) – Used for initialisation of U and V. Pass an int for reproducible results across multiple function calls. See Glossary.

fit(X, y)#

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters:
  • X ({array-like, dataframe, sparse matrix} of shape (n_samples, n_features)) – Data array.

  • y (array-like of shape (n_samples,)) – Target array.

Returns:

self – Return the instance itself.

Return type:

object

fit_predict(X, y)#
predict(X)#