bipartite_learn.preprocessing package#

Submodules#

bipartite_learn.preprocessing.monopartite module#

class bipartite_learn.preprocessing.monopartite.PositiveSemidefiniteEnforcer(tol=1e-05)#

Bases: BaseEstimator, TransformerMixin

Modify main diagonal to enforce positive semidefiniteness.

Adds to the main diagonal the absolute value of the minimum negative eigen-value, returning a positive-semidefinite version of X.

Parameters:

tol (float, default=1e-5) – To avoid small negative values due to numerical precision, tol is also added to the diagonal.

fit(X, y=None, **fit_params)#
fit_transform(X, y=None, **fit_params)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

class bipartite_learn.preprocessing.monopartite.SimilarityDistanceSwitcher#

Bases: BaseEstimator, TransformerMixin

Transforms x into (1 - x).

fit(X, y=None)#
inverse_transform(X)#
transform(X)#
class bipartite_learn.preprocessing.monopartite.SymmetryEnforcer(sampling_strategy='auto')#

Bases: BaseSampler

Make matrix symmetric by averaging it with its transpose.

class bipartite_learn.preprocessing.monopartite.TargetKernelDiffuser(n_iter: int = 2, n_neighbors: int = 4, metric: str | Callable = 'rbf', gamma: float = 1.0, gamma_scale: Literal['constant', 'squares', 'squared_errors', 'size'] = 'squares', filter_params: bool = False, n_jobs: int | None = None, **kwds)#

Bases: BaseSampler

Calculates kernel on y and performs non-linear kernel diffusion.

DOI: https://doi.org/10.1016/j.aca.2016.01.014 Hao _et al._, 2016.

X is assumed to be a precomputed kernel matrix. The target kernel will be calculated with sklearn.metrics.pairwise.pairwise_kernels and combined with X by a kernel diffusion procedure [1].

The default kernel is RBF, so that it calculates the ‘gaussian interaction profile’ as described by [2].

Valid values for metric are:

[‘additive_chi2’, ‘chi2’, ‘linear’, ‘poly’, ‘polynomial’, ‘rbf’, ‘laplacian’, ‘sigmoid’, ‘cosine’]

Parameters:
  • n_iter (int, default=2) – Number of diffusion iterations.

  • n_neighbors (int, default=4) – n_neighbors parameter passed to kneighbors_graph for local similarity calculation.

  • metric (str or callable, default="rbf") – The metric to use when calculating kernel between instances in a feature array. If metric is a string, it must be one of the metrics in sklearn.metrics.pairwise.PAIRWISE_KERNEL_FUNCTIONS. If metric is “precomputed”, y is assumed to be a kernel matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from y as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.

  • gamma (float, default=1.0) – gamma parameter of kernel function if metric is callable, ‘chi2’, ‘polynomial’, ‘rbf’, ‘laplacian’ or ‘sigmoid’.

  • gamma_scale ({'constant', 'squares', 'squared_errors', 'size'}, default='squares') – If not ‘constant’, divide gamma by S / y.shape[0], where S = (y**2).sum(), if gamma_scale=’squares’, ((y-y.mean()) ** 2).sum() if ‘squared_errors’ and ‘y.size’ if ‘size’.

  • filter_params (bool, default=False) – Whether to filter invalid kernel parameters or not.

  • n_jobs (int, default=None) – The number of jobs to use for the kernel computation. This works by breaking down the y matrix into n_jobs even slices and computing them in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the kernel function.

References

class bipartite_learn.preprocessing.monopartite.TargetKernelLinearCombiner(alpha: float = 0.5, metric: str | Callable = 'rbf', gamma: float = 1.0, gamma_scale: Literal['constant', 'squares', 'squared_errors', 'size'] = 'squares', filter_params: bool = False, n_jobs: int | None = None, **kernel_params)#

Bases: BaseSampler

Combines provided similarity matrix X with kernel calculated over y

X is assumed to be a precomputed kernel matrix. The target kernel will be calculated with sklearn.metrics.pairwise.pairwise_kernels and combined with X simply by taking alpha*X + (1-alpha)*y_kernel.

The default kernel is RBF, so that it calculates the ‘gaussian interaction profile’ as described by [1].

Valid values for metric are:

[‘additive_chi2’, ‘chi2’, ‘linear’, ‘poly’, ‘polynomial’, ‘rbf’, ‘laplacian’, ‘sigmoid’, ‘cosine’]

Parameters:
  • alpha (float, default=0.5) – Controls the fraction of the target information in the linear combination with the provided similarities. alpha=1 means no change, alpha=0 means no original X data will remain.

  • metric (str or callable, default="rbf") – The metric to use when calculating kernel between instances in a feature array. If metric is a string, it must be one of the metrics in sklearn.metrics.pairwise.PAIRWISE_KERNEL_FUNCTIONS. If metric is “precomputed”, y is assumed to be a kernel matrix. Alternatively, if metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two rows from y as input and return the corresponding kernel value as a single number. This means that callables from sklearn.metrics.pairwise are not allowed, as they operate on matrices, not single samples. Use the string identifying the kernel instead.

  • gamma (float, default=1.0) – gamma parameter of kernel function if metric is callable, ‘chi2’, ‘polynomial’, ‘rbf’, ‘laplacian’ or ‘sigmoid’.

  • gamma_scale ({'constant', 'squares', 'squared_errors', 'size'}, default='squares') – If not ‘constant’, divide gamma by S / y.shape[0], where S = (y**2).sum(), if gamma_scale=’squares’, ((y-y.mean()) ** 2).sum() if ‘squared_errors’ and ‘y.size’ if ‘size’.

  • filter_params (bool, default=False) – Whether to filter invalid kernel parameters or not.

  • n_jobs (int, default=None) – The number of jobs to use for the kernel computation. This works by breaking down the y matrix into n_jobs even slices and computing them in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the kernel function.

References

bipartite_learn.preprocessing.monopartite.enforce_positive_semidefiniteness(X, tol=1e-05)#

Enforce positive-semidefiniteness of kernel matrix.

Modifies only the main diagonal of X.

Adds to the main diagonal the absolute value of the minimum negative eigen-value, returning a positive-semidefinite version of X.

Parameters:
  • X (square two-dimensional ndarray) – Kernel matrix to transform

  • tol (float, default=1e-5) – To avoid small negative values due to numerical precision, tol is also added to the diagonal

Returns:

Symmetric positive-semidefinite version of X.

Return type:

ndarray with same shape as X

bipartite_learn.preprocessing.monopartite.nearest_positive_semidefinite(X)#

Get nearest (Frobenius norm) positive semidefinite matrix from A.

See Equations (2.1) and (2.2) of [1]. Also see [https://stackoverflow.com/q/43238173/11286509].

[1] N.J. Higham, “Computing a nearest symmetric positive semidefinite matrix” (1988): https://doi.org/10.1016/0024-3795(88)90223-6

bipartite_learn.preprocessing.multipartite module#

class bipartite_learn.preprocessing.multipartite.DTHybridSampler(lamb=0.5, alpha=0.5)#

Bases: BaseMultipartiteSampler

Module contents#