bipartite_learn.datasets package#

Module contents#

class bipartite_learn.datasets.BaseFileLoader(*, filepath: str | Path, description: str | BaseFileLoader = '')#

Bases: object

Abstract base class for file loader objects. .. attribute:: filepath

A Path object representing the location of the file to be loaded.

type:

Path

description#

A string describing the file being loaded or a BaseFileLoader instance that provides additional details about the file.

Type:

str or BaseFileLoader

get_description() str#

Return the text description of the file being loaded.

abstract load(as_frame: bool = False) ndarray | str#

Load the file and return its contents. :param as_frame: Whether to return the data as a pandas DataFrame or not. :type as_frame: bool, default=False

Returns:

The contents of the file.

Return type:

str or np.ndarray or pandas.DataFrame

property local_path: Path#

The complete path to the local file.

set_description(description: str | BaseFileLoader) None#

Set the description of the file being loaded.

class bipartite_learn.datasets.BaseRemoteFileLoader(*, url: str, filepath: str | ~pathlib.Path | None = None, description: str | ~bipartite_learn.datasets.loader.BaseRemoteFileLoader = '', base_dir: str | ~pathlib.Path | None = None, checksum: str | None = None, hash_function: str | None = <function _sha256>)#

Bases: BaseFileLoader

Abstract base class for loaders of remote files. .. attribute:: url

The URL address of the file to be fetched.

type:

str

filepath#

A Path object representing the location of the file to be loaded, relative to self.base_dir.

Type:

Path

description#

A string describing the file being loaded or a BaseFileLoader instance that loads such string.

Type:

str or BaseFileLoader

base_dir#

A Path object representing the root directory for self.filepath.

Type:

Path

checksum#

The expected checksum of the file. If None, integrity test is not performed after download.

Type:

str or None

hash_function#

A function receiving the local filepath and returning its checksum. If None, integrity test is not performed after download.

Type:

Callable or None

clear_local() None#

Delete the local copy of the file.

download() Path#

Download the file to self.local_path and verify its integrity.

load(as_frame: bool = False) ndarray | str#

Load the local file if available, otherwise download it first. :param as_frame: Whether to return the data as a pandas DataFrame or not. :type as_frame: bool, default=False

Returns:

The contents of the file.

Return type:

str or np.ndarray or pandas.DataFrame

abstract load_local(as_frame: bool = False) ndarray | str#

Load the local file and raise error if it is not available. :param as_frame: Whether to return the data as a pandas DataFrame or not. :type as_frame: bool, default=False

Returns:

The contents of the file.

Return type:

str or np.ndarray or pandas.DataFrame

property local_path: Path#

The complete path to the local file.

rebase_dir(new_base: str | Path) None#

Redirect the target download directory to be under new_base.

set_base_dir(base_dir: str | Path | None) None#

Set the root directory for self.filepath.

set_description(description: str | BaseFileLoader) None#

Set the description of the file being loaded.

class bipartite_learn.datasets.BipartiteDatasetLoader(*, X_loader: list[bipartite_learn.datasets.loader.BaseFileLoader], y_loader: BaseFileLoader, filepath: str | Path, base_dir: str | Path = None, description: str | BaseFileLoader = '')#

Bases: BaseRemoteFileLoader

Basic loader for bipartite datasets.

This class groups data loaders for each file of a bipartite dataset.

X_loader#

List of remote file loaders for the feature matrices.

Type:

list[BaseRemoteFileLoader]

y_loader#

Remote file loader for the interaction matrix.

Type:

BaseRemoteFileLoader

filepath#

Name for the local directory where the downloaded files will be stored. The directory will be created if it does not exist.

Type:

Path

base_dir#

Base directory for the filepath. If None, uses the current directory.

Type:

Path

description#

String description of the dataset or a file loader that loads it.

Type:

str or BaseRemoteFileLoader

clear_local() None#

Delete the local copies of the files.

download() Path#

Download the dataset files and verify their integrity.

load(as_frame: bool = False) tuple[list[numpy.ndarray | str], numpy.ndarray | str]#

Load the local files if available, otherwise download them first. :param as_frame: Whether to return the data as a pandas DataFrames or numpy arrays. :type as_frame: bool, default=False

Returns:

X, y – Where Data is either pandas.DataFrame or numpy.ndarray. A tuple with the list of feature matrices for each axis and their corresponding interaction matrix.

Return type:

tuple[list[Data], Data]

load_local(as_frame: bool = False) tuple[list[numpy.ndarray | str], numpy.ndarray | str]#

Load the local files and raise error if one is not available. :param as_frame: Whether to return the data as a pandas DataFrames or numpy arrays. :type as_frame: bool, default=False

Returns:

X, y – Where Data is either pandas.DataFrame or numpy.ndarray. A tuple with the list of feature matrices for each axis and their corresponding interaction matrix.

Return type:

tuple[list[Data], Data]

set_base_dir(base_dir: str | Path | None) None#

Set the root directory for self.filepath and update loaders.

class bipartite_learn.datasets.EnzymesLoader(base_dir: str | Path | None = None)#

Bases: BipartiteDatasetLoader

Binary interaction prediction between enzymes and drug molecules.

This is one of four gold-standand datasets for drug-protein interaction prediction introduced by Yamanishi et al., 2008 [1]_.

The input features are similarity matrices among the instances on each axis and the target matrix is a binary interaction matrix determining the the existence of an experimentally validated interaction (value 1) or the absence of information about an interaction (value 0).

The score of a Smith-Waterman pairwise alignment is taken as the the similarity between proteins, whereas the SIMCOMP score is used for the similarity between drug molecules.

The original files can be downloaded from:

http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/

References

Minoru Kanehisa, Prediction of drug–target interaction networks from the integration of chemical and genomic spaces, Bioinformatics, Volume 24, Issue 13, July 2008, Pages i232–i240, https://doi.org/10.1093/bioinformatics/btn162

class bipartite_learn.datasets.GPCRLoader(base_dir: str | Path | None = None)#

Bases: BipartiteDatasetLoader

Binary interactions between G-protein coupled receptors and drug molecules.

This is one of four gold-standand datasets for drug-protein interaction prediction introduced by Yamanishi et al., 2008 [1]_.

The input features are similarity matrices among the instances on each axis and the target matrix is a binary interaction matrix determining the the existence of an experimentally validated interaction (value 1) or the absence of information about an interaction (value 0).

The score of a Smith-Waterman pairwise alignment is taken as the the similarity between proteins, whereas the SIMCOMP score is used for the similarity between drug molecules.

The original files can be downloaded from:

http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/

References

Minoru Kanehisa, Prediction of drug–target interaction networks from the integration of chemical and genomic spaces, Bioinformatics, Volume 24, Issue 13, July 2008, Pages i232–i240, https://doi.org/10.1093/bioinformatics/btn162

class bipartite_learn.datasets.IonChannelsLoader(base_dir: str | Path | None = None)#

Bases: BipartiteDatasetLoader

Binary interactions between ion channels and drug molecules.

This is one of four gold-standand datasets for drug-protein interaction prediction introduced by Yamanishi et al., 2008 [1]_.

The input features are similarity matrices among the instances on each axis and the target matrix is a binary interaction matrix determining the the existence of an experimentally validated interaction (value 1) or the absence of information about an interaction (value 0).

The score of a Smith-Waterman pairwise alignment is taken as the the similarity between proteins, whereas the SIMCOMP score is used for the similarity between drug molecules.

The original files can be downloaded from:

http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/

References

Minoru Kanehisa, Prediction of drug–target interaction networks from the integration of chemical and genomic spaces, Bioinformatics, Volume 24, Issue 13, July 2008, Pages i232–i240, https://doi.org/10.1093/bioinformatics/btn162

class bipartite_learn.datasets.NuclearReceptorsLoader(base_dir: str | Path | None = None)#

Bases: BipartiteDatasetLoader

Binary interactions between nuclear receptors and drug molecules.

This is one of four gold-standand datasets for drug-protein interaction prediction introduced by Yamanishi et al., 2008 [1]_.

The input features are similarity matrices among the instances on each axis and the target matrix is a binary interaction matrix determining the the existence of an experimentally validated interaction (value 1) or the absence of information about an interaction (value 0).

The score of a Smith-Waterman pairwise alignment is taken as the the similarity between proteins, whereas the SIMCOMP score is used for the similarity between drug molecules.

The original files can be downloaded from:

http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/

References

Minoru Kanehisa, Prediction of drug–target interaction networks from the integration of chemical and genomic spaces, Bioinformatics, Volume 24, Issue 13, July 2008, Pages i232–i240, https://doi.org/10.1093/bioinformatics/btn162

bipartite_learn.datasets.get_data_home(data_home: str | Path | None = None) Path#

Return the path of the bipartite_learn data directory. This folder is used by some large dataset loaders to avoid downloading the data several times. By default the data directory is set to a folder named ‘bipartite_learn_data’ in the user home folder. Alternatively, it can be set by the ‘BIPARTITE_LEARN_DATA’ environment variable or programmatically by giving an explicit folder path. The ‘~’ symbol is expanded to the user home folder. If the folder does not already exist, it is automatically created. :param data_home: The path to bipartite_learn data directory. If None, the default path

is ~/bipartite_learn_data.

Returns:

data_home – The path to bipartite_learn data directory.

Return type:

Path