bipartite_learn package#

Subpackages#

Submodules#

bipartite_learn.base module#

class bipartite_learn.base.BaseBipartiteEstimator#

Bases: BaseMultipartiteEstimator

class bipartite_learn.base.BaseMultipartiteEstimator#

Bases: BaseEstimator

Base class for multipartite estimators.

score(X, y, sample_weight=None)#
class bipartite_learn.base.BaseMultipartiteSampler#

Bases: MultipartiteSamplerMixin

Base class for multipartite samplers.

fit_resample(X, y)#

Resample the dataset.

Parameters:
  • X ({array-like, dataframe, sparse matrix} of shape (n_samples, n_features)) – Matrix containing the data which have to be sampled.

  • y (array-like of shape (n_samples,)) – Corresponding label for each sample in X.

Returns:

  • X_resampled ({array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features)) – The array containing the resampled data.

  • y_resampled (array-like of shape (n_samples_new,)) – The corresponding label of X_resampled.

sampling_strategy = 'auto'#
class bipartite_learn.base.MultipartiteSamplerMixin#

Bases: BaseMultipartiteEstimator, SamplerMixin

class bipartite_learn.base.MultipartiteTransformerMixin#

Bases: TransformerMixin

Mixin for multipartite transformers.

bipartite_learn.melter module#

class bipartite_learn.melter.BipartiteMelter#

Bases: BaseMultipartiteSampler, BaseBipartiteEstimator

Convert a bipartite dataset to a simpler global-single output format.

Convert a bipartite interaction problem, where there are two feature matrices in X (one for each axis) and an interaction matrix y to a simpler usual format where each sample is a combination of samples from X[0] and X[1].

Slightly faster than MultipartiteMelter.

bipartite_learn.melter.melt_multipartite_dataset(X, y=None)#

Melt bipartite input.

If X is a list of Xi feature matrices, one for each bipartite group, convert it to traditional data format by generating concatenations of rows from X[0] with rows from X[1].

bipartite_learn.melter.row_cartesian_product(X)#

Row cartesian product of 2D arrays in X.

Pick one row from each of the 2D arrays in X, in their presented order, and concatenate them. Repeat. Return a 2D array where its rows are all the possible combinations of rows in X.

Parameters:

X (list-like of 2D np.ndarrays) –

Returns:

result – Cartesian product of X’s 2d arrays, row-wise.

Return type:

2D np.ndarray

bipartite_learn.neighbors module#

Distance Weighted Neighbors Regression.

class bipartite_learn.neighbors.WeightedNeighborsRegressor(*, weights='distance', p=2, metric='minkowski', metric_params=None, n_jobs=None)#

Bases: KNeighborsMixin, RegressorMixin, NeighborsBase

fit(X, y)#

Fit the k-nearest neighbors regressor from the training dataset. :param X: Training data. :type X: {array-like, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_samples) if metric=’precomputed’ :param y: Target values. :type y: {array-like, sparse matrix} of shape (n_samples,) or (n_samples, n_outputs)

Returns:

self – The fitted k-nearest neighbors regressor.

Return type:

KNeighborsRegressor

predict(X)#

Predict the target for the provided data. :param X: Test samples. :type X: {array-like, sparse matrix} of shape (n_queries, n_features), or (n_queries, n_indexed) if metric == ‘precomputed’

Returns:

y – Target values.

Return type:

ndarray of shape (n_queries,) or (n_queries, n_outputs), dtype=int

bipartite_learn.pipeline module#

bipartite_learn.pipeline.make_multipartite_pipeline(*steps, ndim=2, memory=None, verbose=False)#

Utility function to create pipelines for multipartite data.

It wraps monopartite transformers with MultipartiteTransformerWrapper.

bipartite_learn.wrappers module#

Set of tools to apply standard estimators to bipartite datasets.

TODO: Docs. TODO: check fit inputs.

class bipartite_learn.wrappers.GlobalSingleOutputWrapper(estimator: BaseEstimator, under_sampler: BaseSampler | None = None)#

Bases: BaseMultipartiteEstimator, MetaEstimatorMixin

Employ the GSO strategy to adapt sstandard estimators to bipartite data.

In this strategy, the estimator is applied to concatenations of a feature vector from the first sample domain with a feature vector from the second domain, while y is considered a unidimensional vector.

source/_static/user_guide/gso.svg source/_static/user_guide/gso_dark.svg

Read more in the User Guide.

References

property classes_#

The classes labels. Only exist if the estimator is a classifier.

decision_function(X)#
property feature_names_in_#

Names of features seen during fit.

fit(X, y=None, **fit_params)#
fit_predict(X, y=None, **fit_params)#
fit_transform(X, y=None, **fit_params)#
inverse_transform(Xt)#
property n_features_in_#

Number of features seen during fit.

predict(X, **predict_params)#
predict_log_proba(X, **predict_log_proba_params)#
predict_proba(X, **predict_proba_params)#
score(X, y=None, sample_weight=None)#
score_samples(X)#
transform(X)#
class bipartite_learn.wrappers.LocalMultiOutputWrapper(primary_rows_estimator: ~sklearn.base.BaseEstimator, primary_cols_estimator: ~sklearn.base.BaseEstimator, secondary_rows_estimator: ~sklearn.base.BaseEstimator, secondary_cols_estimator: ~sklearn.base.BaseEstimator, combine_predictions_func: ~typing.Callable = <function mean>, combine_func_kwargs: dict | None = None, independent_labels: bool = True)#

Bases: BaseBipartiteEstimator

Implements the Local Multi-Output strategy for adapting estimators.

This wrapper facilitates the implementation of the local multi-output approach to adapt monopartite estimators to bipartite scenarios. In this approach, four multi-output estimators are aggregated.

The training procedure (calling fit(X_train, y)) consists simply of:

  1. Train a primary rows estimator on X_train[0] and y_train.

  2. Train a primary columns estimator on X_train[1] and y_train.T.

The prediction procedure then utilities the predictions of the primary estimators in order to be able to make predictions on completely new interactions. predict(X_test) will perform the following steps:

    1. Use self.primary_cols_estimator_ to predict new columns for

      the interaction matrix, that correspond to the targets of X_test[0].

    1. Use self.primary_rows_estimator_ to predict new rows for the

      interaction matrix, that correspond to the targets of X_test[1].

    1. Fit the secondary rows estimator on the newly predicted columns

      and X_test[0].

    1. Fit the secondary columns estimator on the newly predicted rows

      and X_test[1].

  1. Combine the predictions of the secondary estimators using

    self.combine_predictions_func(rows_pred, cols_pred).

If self.independent_labels is False, then the original training data is appended to the training data of the secondary estimators in step 2, allowing the secondary estimators to explore inter-output correlations.

See the User Guide for a diagram and more information.

primary_rows_estimator_#

The fitted primary rows estimator.

Type:

BaseEstimator

primary_cols_estimator_#

The fitted primary columns estimator.

Type:

BaseEstimator

secondary_rows_estimator_#

The fitted secondary rows estimator.

Type:

BaseEstimator

secondary_cols_estimator_#

The fitted secondary columns estimator.

Type:

BaseEstimator

Notes

Note that the secondary estimators must be refit every time the wrapper’s predict() method is called, which may increase prediction time depending on the type of secondary estimators chosen by the user.

Compositions of single-output estimators can also be used instead of multi-output estimators, which can be implemented with scikit-learn wrappers such as MultiOutputRegressor or MultiOutputClassifier. This could be an interesting option in cases where the base estimators do not natively support multiple outputs.

See also

GlobalSingleOutputWrapper

A wrapper that fits a single-output estimator to bipartite datasets.

MultiOutputRegressor

A scikit-learn wrapper that fits a separate regressor for each output variable.

MultiOutputClassifier

A scikit-learn wrapper that fits a separate classifier for each output variable.

Examples

from bipartite_learn.datasets import NuclearReceptorsLoader
from bipartite_learn.wrappers import LocalMultiOutputWrapper
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.multioutput import MultiOutputClassifier

X, y = NuclearReceptorsLoader().load()  # X is a list of two matrices
bipartite_clf = LocalMultiOutputWrapper(
    primary_rows_estimator=MultiOutputClassifier(SVC()),
    primary_cols_estimator=MultiOutputClassifier(SVC()),
    secondary_rows_estimator=KNeighborsClassifier(),
    secondary_cols_estimator=KNeighborsClassifier(),
)
bipartite_clf.fit(X, y)

References

property classes_#

The classes labels. Only exist if the estimator is a classifier.

decision_function(X, **decision_function_params)#
property feature_names_in_#

Names of features seen during first step fit method.

fit(X, y, **fit_params)#

Fits the wrapper to the training data.

Raises:

IncompatibleEstimatorsError – If any of the estimators passed as arguments does not support multi-output functionality. If the secondary estimators are not of the same type (e.g., regressor, classifier). If only one of the primary estimators is pairwise.

fit_predict(X, y=None, **fit_params)#
property n_features_in_#

Number of features seen during fit.

predict(X, **predict_params)#
predict_log_proba(X, **predict_log_proba_params)#
predict_proba(X, **predict_proba_params)#
score(X, y=None)#
class bipartite_learn.wrappers.MultipartiteSamplerWrapper(samplers: BaseEstimator | Sequence[BaseEstimator], ndim: int | None = 2)#

Bases: BaseMultipartiteSampler

Manages a sampler for each feature space in multipartite datasets.

class bipartite_learn.wrappers.MultipartiteTransformerWrapper(transformers: BaseEstimator | Sequence[BaseEstimator], ndim: int | None = 2)#

Bases: BaseMultipartiteEstimator, TransformerMixin

Manages a transformer for each feature space in multipartite datasets.

fit(X, y=None)#
fit_transform(X, y=None)#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

transform(X, y=None)#