bipartite_learn.tree package#

Module contents#

class bipartite_learn.tree.BipartiteDecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, min_rows_split=1, min_cols_split=1, min_rows_leaf=1, min_cols_leaf=1, min_row_weight_fraction_leaf=0.0, min_col_weight_fraction_leaf=0.0, max_row_features=None, max_col_features=None, bipartite_adapter='gmosa', prediction_weights=None)#

Bases: BaseBipartiteDecisionTree, DecisionTreeClassifier

Decision tree classifier tailored to bipartite input.

Implements optimized global single output (GSO) and multi-output (GMO) trees for interaction prediction. The latter is proposed by [1] under the name of Predictive Bi-Clustering Trees. The former implements an optimzied algorithm for growing GSO trees, which consider concatenated pairs of row and column instances in a bipartite dataset as the actual intances.

GSO trees (bipartite_adapter=”gso”) will yield the exactly same tree structure as if all possible combinations of row and column instances were provided to a usual sklearn.DecisionTreeRegressor, but in much sorter time.

See [1] and [2] and also the User Guide. (TODO)

Parameters:
  • criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon information gain, see tree_mathematical_formulation.

  • splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node. Analogous to min_samples_leaf.

  • min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node. Analogous to min_samples_leaf.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all the input rows) required to be at a leaf node. Rows have equal weight when sample_weight is not provided.

  • min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of all the input columns) required to be at a leaf node. Columns have equal weight when sample_weight is not provided.

  • max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_row_features features at each row split.

    • If float, then max_row_features is a fraction and int(max_row_features * n_row_features) features are considered at each split.

    • If “auto”, then max_row_features=n_row_features.

    • If “sqrt”, then max_row_features=sqrt(n_row_features).

    • If “log2”, then max_row_features=log2(n_row_features).

    • If None or 1.0, then max_row_features=n_row_features.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_row_features row features.

  • max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_(row/col)_features < n_(row/col)_features, the algorithm will select max_(row/col)_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_(row/col)_features=n_(row/col)_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

  • ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

  • bipartite_adapter ({"gmo", "gmosa"}, default="gmosa") –

    Which strategy to employ when searching for the best split. The global single-output strategy (“gso”) is equivalent to grow a tree on all combinations of row samples with column samples and their corresponding label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However, an optimized algorithm allows for these trees to exploit the bipartite dataset directly and grow much faster than a naive implementation of this idea.

    The splitting procedure of the global multi-output strategy (“gmo”), differently than the GSO adaptation, treats each sample on the other axis as a different output: hen splitting on y rows, each column is an output, when splitting on columns, each row is a different output.

    In other words, while GMO calculates impurity as a deviance metric (call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)), the GSO adaptation uses deviance relative to the total average label: (d(y[i, j], y.mean()).mean().

    The GMO strategy with single label average (“gmosa”) works exactly the same as explained for GMO, but assumes the tree’s output value will be y_leaf.mean(). Using “gmo” allows for combining each column or row of the leaf’s partition in multiple ways (see the prediction_weights parameter), but results in much larger memory usage, since all outputs must be stored. Using “gmosa” solves this memory issue by storing only the leaf total average in each node, at the cost of loosing the ability to specify prediction weights.

    See [1] for more information.

  • prediction_weights ({"uniform", "raw", "precomputed", "square", "softmax" }, 1D-array or callable, default="uniform") –

    Determines how to compute the final predicted value. Initially, all predictions for each row and column instance from the training set that share the leaf node with the predicting sample are obtained.

    • ”raw” instructs to return this vector, with a value for each training row and training column, and np.nan for instances not in the same leaf.

    • ”uniform” returns the mean value of the leaf for new instances but, for instances present in the training set, it uses the leaf’s row or column mean corresponding to it. Known instances are recognized by a a similarity value of 1. This is the main approach presented by [1], named global multi-output (GMO).

    Other options return the weighted average of the leaf values:

    • A 1D-array may be provided to specify training sample weights explicitly, with weights for training row samples followed by weights for training column samples (length==`sum(y_train.shape)`).

    • ”precomputed” instructs the estimator to consider x values as similarities to each row and column sample in the training set (row similarities followed by column similarities), using them as weights to average the leaf outputs.

    • A callable, if provided, takes all the X being predicted and must return an array of weights for each predicting sample with the same shape as the X array given to predict().

    • ”square” is equivalent to prediction_weights=np.square.

    • ”softmax” is equivalent to prediction_weights=np.exp.

feature_importances_#

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

max_row_features_#

The inferred value of max_row_features.

Type:

int

max_col_features_#

The inferred value of max_col_features.

Type:

int

n_features_in_#

Number of features seen during fit.

Type:

int

n_row_features_in_#

Number of row features seen during fit.

Type:

int

n_col_features_in_#

Number of column features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

BipartiteDecisionTreeRegressor

A bipartite decision tree regressor.

BipartiteExtraTreeRegressor

An extremely randomized bipartite tree regressor.

BipartiteExtraTreeClassifier

An extremely randomized bipartite tree classifier.

bipartite_learn.ensemble.BipartiteExtraTreesClassifier

A bipartite extra-trees classifier.

bipartite_learn.ensemble.BipartiteExtraTreesRegressor

A bipartite extra-trees regressor.

bipartite_learn.ensemble.BipartiteRandomForestClassifier

A bipartite random forest classifier.

bipartite_learn.ensemble.BipartiteRandomForestRegressor

A bipartite random forest regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

fit(X, y, sample_weight=None, check_input=True)#

Build a decision tree classifier from the training set (X, y). :param X: The training input samples. Internally, it will be converted to

dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

Parameters:
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The target values (class labels) as integers or strings.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.

  • check_input (bool, default=True) – Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.

Returns:

self – Fitted estimator.

Return type:

DecisionTreeClassifier

predict_proba(X, check_input=True)#

Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :param X: The input samples. Internally, it will be converted to

dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Parameters:

check_input (bool, default=True) – Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.

Returns:

proba – The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

Return type:

ndarray of shape (n_samples, n_classes) or list of n_outputs such arrays if n_outputs > 1

class bipartite_learn.tree.BipartiteDecisionTreeRegressor(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, min_rows_split=1, min_cols_split=1, min_rows_leaf=1, min_cols_leaf=1, min_row_weight_fraction_leaf=0.0, min_col_weight_fraction_leaf=0.0, max_row_features=None, max_col_features=None, bipartite_adapter='gmosa', prediction_weights=None)#

Bases: BaseBipartiteDecisionTree, DecisionTreeRegressor

Decision tree regressor tailored to bipartite input.

Implements optimized global single output (GSO) and multi-output (GMO) trees for interaction prediction. The latter is proposed by [1] under the name of Predictive Bi-Clustering Trees. The former implements an optimzied algorithm for growing GSO trees, which consider concatenated pairs of row and column instances in a bipartite dataset as the actual intances.

GSO trees (bipartite_adapter=”gso”) will yield the exactly same tree structure as if all possible combinations of row and column instances were provided to a usual sklearn.DecisionTreeRegressor, but in much sorter time.

See [1] and [2] and also the User Guide. (TODO)

Parameters:
  • criterion ({"squared_error", "friedman_mse"}, default="squared_error") – The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node and “friedman_mse”, which uses mean squared error with Friedman’s improvement score for potential splits.

  • splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node. Analogous to min_samples_leaf.

  • min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node. Analogous to min_samples_leaf.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all the input rows) required to be at a leaf node. Rows have equal weight when sample_weight is not provided.

  • min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of all the input columns) required to be at a leaf node. Columns have equal weight when sample_weight is not provided.

  • max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_row_features features at each row split.

    • If float, then max_row_features is a fraction and int(max_row_features * n_row_features) features are considered at each split.

    • If “auto”, then max_row_features=n_row_features.

    • If “sqrt”, then max_row_features=sqrt(n_row_features).

    • If “log2”, then max_row_features=log2(n_row_features).

    • If None or 1.0, then max_row_features=n_row_features.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_row_features row features.

  • max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_(row/col)_features < n_(row/col)_features, the algorithm will select max_(row/col)_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_(row/col)_features=n_(row/col)_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

  • ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

  • bipartite_adapter ({"gmo", "gmosa"}, default="gmosa") –

    Which strategy to employ when searching for the best split. The global single-output strategy (“gso”) is equivalent to grow a tree on all combinations of row samples with column samples and their corresponding label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However, an optimized algorithm allows for these trees to exploit the bipartite dataset directly and grow much faster than a naive implementation of this idea.

    The splitting procedure of the global multi-output strategy (“gmo”), differently than the GSO adaptation, treats each sample on the other axis as a different output: hen splitting on y rows, each column is an output, when splitting on columns, each row is a different output.

    In other words, while GMO calculates impurity as a deviance metric (call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)), the GSO adaptation uses deviance relative to the total average label: (d(y[i, j], y.mean()).mean().

    The GMO strategy with single label average (“gmosa”) works exactly the same as explained for GMO, but assumes the tree’s output value will be y_leaf.mean(). Using “gmo” allows for combining each column or row of the leaf’s partition in multiple ways (see the prediction_weights parameter), but results in much larger memory usage, since all outputs must be stored. Using “gmosa” solves this memory issue by storing only the leaf total average in each node, at the cost of loosing the ability to specify prediction weights.

    See [1]_ for more information.

  • prediction_weights ({"uniform", "raw", "precomputed", "square", "softmax" }, 1D-array or callable, default="uniform") –

    Determines how to compute the final predicted value. Initially, all predictions for each row and column instance from the training set that share the leaf node with the predicting sample are obtained.

    • ”raw” instructs to return this vector, with a value for each training row and training column, and np.nan for instances not in the same leaf.

    • ”uniform” returns the mean value of the leaf for new instances but, for instances present in the training set, it uses the leaf’s row or column mean corresponding to it. Known instances are recognized by a a similarity value of 1. This is the main approach presented by [1], named global multi-output (GMO).

    Other options return the weighted average of the leaf values:

    • A 1D-array may be provided to specify training sample weights explicitly, with weights for training row samples followed by weights for training column samples (length==`sum(y_train.shape)`).

    • ”precomputed” instructs the estimator to consider x values as similarities to each row and column sample in the training set (row similarities followed by column similarities), using them as weights to average the leaf outputs.

    • A callable, if provided, takes all the X being predicted and must return an array of weights for each predicting sample with the same shape as the X array given to predict().

    • ”square” is equivalent to prediction_weights=np.square.

    • ”softmax” is equivalent to prediction_weights=np.exp.

feature_importances_#

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

max_row_features_#

The inferred value of max_row_features.

Type:

int

max_col_features_#

The inferred value of max_col_features.

Type:

int

n_features_in_#

Number of features seen during fit.

Type:

int

n_row_features_in_#

Number of row features seen during fit.

Type:

int

n_col_features_in_#

Number of column features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

BipartiteExtraTreeRegressor

An extremely randomized bipartite tree regressor.

BipartiteDecisionTreeClassifier

A bipartite decision tree classifier.

BipartiteExtraTreeClassifier

An extremely randomized bipartite tree classifier.

bipartite_learn.ensemble.BipartiteExtraTreesClassifier

A bipartite extra-trees classifier.

bipartite_learn.ensemble.BipartiteExtraTreesRegressor

A bipartite extra-trees regressor.

bipartite_learn.ensemble.BipartiteRandomForestClassifier

A bipartite random forest classifier.

bipartite_learn.ensemble.BipartiteRandomForestRegressor

A bipartite random forest regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

fit(X, y, sample_weight=None, check_input=True)#

Build a decision tree regressor from the training set (X, y).

Parameters:
  • X (list-like of {array-like, sparse matrix} of shapes (n_axis_samples,) – n_axis_features). The training input samples for each axis. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

  • y (array-like of shape (n_row_samples, n_col_samples)) – The target values (real . Use dtype=np.float64 and order='C' for maximum efficiency.

  • sample_weight (array-like of shape (n_row_samples+n_col_samples,),) – default=None Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Row sample weights and column sample weights must be provided in one concatenated array.

  • check_input (bool, default=True) – Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

self – Fitted estimator.

Return type:

BipartiteDecisionTreeRegressor

class bipartite_learn.tree.BipartiteDecisionTreeRegressorSS(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, min_rows_split=1, min_cols_split=1, min_rows_leaf=1, min_cols_leaf=1, min_row_weight_fraction_leaf=0.0, min_col_weight_fraction_leaf=0.0, max_row_features=None, max_col_features=None, bipartite_adapter='gmosa', prediction_weights=None, supervision=0.5, ss_adapter=None, unsupervised_criterion_rows='squared_error', unsupervised_criterion_cols='squared_error', update_supervision=None, axis_decision_only=False, preprocess_X_targets=None)#

Bases: BaseBipartiteDecisionTreeSS, BipartiteDecisionTreeRegressor

class bipartite_learn.tree.BipartiteExtraTreeClassifier(*, criterion='gini', splitter='random', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, min_rows_split=1, min_cols_split=1, min_rows_leaf=1, min_cols_leaf=1, min_row_weight_fraction_leaf=0.0, min_col_weight_fraction_leaf=0.0, max_row_features=None, max_col_features=None, bipartite_adapter='gmosa', prediction_weights=None)#

Bases: BipartiteDecisionTreeClassifier

Extremely randomized tree classifier tailored to bipartite input.

Implements optimized global single output (GSO) and multi-output (GMO) trees for interaction prediction. The latter is proposed by [1] under the name of Predictive Bi-Clustering Trees. The former implements an optimzied algorithm for growing GSO trees, which consider concatenated pairs of row and column instances in a bipartite dataset as the actual intances.

GSO trees (bipartite_adapter=”gso”) will yield the exactly same tree structure as if all possible combinations of row and column instances were provided to a usual sklearn.DecisionTreeRegressor, but in much sorter time.

See [1] and [2] and also the User Guide. (TODO)

Parameters:
  • criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon information gain, see tree_mathematical_formulation.

  • splitter ({"best", "random"}, default="random") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node. Analogous to min_samples_leaf.

  • min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node. Analogous to min_samples_leaf.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all the input rows) required to be at a leaf node. Rows have equal weight when sample_weight is not provided.

  • min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of all the input columns) required to be at a leaf node. Columns have equal weight when sample_weight is not provided.

  • max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_row_features features at each row split.

    • If float, then max_row_features is a fraction and int(max_row_features * n_row_features) features are considered at each split.

    • If “auto”, then max_row_features=n_row_features.

    • If “sqrt”, then max_row_features=sqrt(n_row_features).

    • If “log2”, then max_row_features=log2(n_row_features).

    • If None or 1.0, then max_row_features=n_row_features.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_row_features row features.

  • max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_(row/col)_features < n_(row/col)_features, the algorithm will select max_(row/col)_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_(row/col)_features=n_(row/col)_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

  • ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

  • bipartite_adapter ({"gmo", "gmosa"}, default="gmosa") –

    Which strategy to employ when searching for the best split. The global single-output strategy (“gso”) is equivalent to grow a tree on all combinations of row samples with column samples and their corresponding label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However, an optimized algorithm allows for these trees to exploit the bipartite dataset directly and grow much faster than a naive implementation of this idea.

    The splitting procedure of the global multi-output strategy (“gmo”), differently than the GSO adaptation, treats each sample on the other axis as a different output: hen splitting on y rows, each column is an output, when splitting on columns, each row is a different output.

    In other words, while GMO calculates impurity as a deviance metric (call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)), the GSO adaptation uses deviance relative to the total average label: (d(y[i, j], y.mean()).mean().

    The GMO strategy with single label average (“gmosa”) works exactly the same as explained for GMO, but assumes the tree’s output value will be y_leaf.mean(). Using “gmo” allows for combining each column or row of the leaf’s partition in multiple ways (see the prediction_weights parameter), but results in much larger memory usage, since all outputs must be stored. Using “gmosa” solves this memory issue by storing only the leaf total average in each node, at the cost of loosing the ability to specify prediction weights.

    See [1] for more information.

  • prediction_weights ({"uniform", "raw", "precomputed", "square", "softmax" }, 1D-array or callable, default="uniform") –

    Determines how to compute the final predicted value. Initially, all predictions for each row and column instance from the training set that share the leaf node with the predicting sample are obtained.

    • ”raw” instructs to return this vector, with a value for each training row and training column, and np.nan for instances not in the same leaf.

    • ”uniform” returns the mean value of the leaf for new instances but, for instances present in the training set, it uses the leaf’s row or column mean corresponding to it. Known instances are recognized by a a similarity value of 1. This is the main approach presented by [1], named global multi-output (GMO).

    Other options return the weighted average of the leaf values:

    • A 1D-array may be provided to specify training sample weights explicitly, with weights for training row samples followed by weights for training column samples (length==`sum(y_train.shape)`).

    • ”precomputed” instructs the estimator to consider x values as similarities to each row and column sample in the training set (row similarities followed by column similarities), using them as weights to average the leaf outputs.

    • A callable, if provided, takes all the X being predicted and must return an array of weights for each predicting sample with the same shape as the X array given to predict().

    • ”square” is equivalent to prediction_weights=np.square.

    • ”softmax” is equivalent to prediction_weights=np.exp.

feature_importances_#

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

max_row_features_#

The inferred value of max_row_features.

Type:

int

max_col_features_#

The inferred value of max_col_features.

Type:

int

n_features_in_#

Number of features seen during fit.

Type:

int

n_row_features_in_#

Number of row features seen during fit.

Type:

int

n_col_features_in_#

Number of column features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

BipartiteDecisionTreeClassifier

A bipartite decision tree classifier.

BipartiteExtraTreeRegressor

An extremely randomized bipartite tree regressor.

BipartiteDecisionTreeRegressor

A bipartite decision tree regressor.

bipartite_learn.ensemble.BipartiteExtraTreesClassifier

A bipartite extra-trees classifier.

bipartite_learn.ensemble.BipartiteExtraTreesRegressor

A bipartite extra-trees regressor.

bipartite_learn.ensemble.BipartiteRandomForestClassifier

A bipartite random forest classifier.

bipartite_learn.ensemble.BipartiteRandomForestRegressor

A bipartite random forest regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

class bipartite_learn.tree.BipartiteExtraTreeRegressor(*, criterion='squared_error', splitter='random', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, min_rows_split=1, min_cols_split=1, min_rows_leaf=1, min_cols_leaf=1, min_row_weight_fraction_leaf=0.0, min_col_weight_fraction_leaf=0.0, max_row_features=None, max_col_features=None, bipartite_adapter='gmosa', prediction_weights=None)#

Bases: BipartiteDecisionTreeRegressor

Extremely randomized trees tailored to bipartite input.

Implements optimized global single output (GSO) and multi-output (GMO) trees for interaction prediction. The latter is proposed by [1] under the name of Predictive Bi-Clustering Trees. The former implements an optimzied algorithm for growing GSO trees, which consider concatenated pairs of row and column instances in a bipartite dataset as the actual intances.

GSO trees (bipartite_adapter=”gso”) will yield the exactly same tree structure as if all possible combinations of row and column instances were provided to a usual sklearn.DecisionTreeRegressor, but in much sorter time.

See [1] and [2] and also the User Guide. (TODO)

Parameters:
  • criterion ({"squared_error", "friedman_mse"}, default="squared_error") – The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node and “friedman_mse”, which uses mean squared error with Friedman’s improvement score for potential splits.

  • splitter ({"best", "random"}, default="random") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node. Analogous to min_samples_leaf.

  • min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node. Analogous to min_samples_leaf.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node. Analogous to min_samples_leaf.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all the input rows) required to be at a leaf node. Rows have equal weight when sample_weight is not provided.

  • min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of all the input columns) required to be at a leaf node. Columns have equal weight when sample_weight is not provided.

  • max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_row_features features at each row split.

    • If float, then max_row_features is a fraction and int(max_row_features * n_row_features) features are considered at each split.

    • If “auto”, then max_row_features=n_row_features.

    • If “sqrt”, then max_row_features=sqrt(n_row_features).

    • If “log2”, then max_row_features=log2(n_row_features).

    • If None or 1.0, then max_row_features=n_row_features.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_row_features row features.

  • max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.

  • random_state (int, RandomState instance or None, default=None) – Used to pick randomly the max_row_features and max_col_features used at each split. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

  • ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

  • bipartite_adapter ({"gmo", "gmosa"}, default="gmosa") –

    Which strategy to employ when searching for the best split. The global single-output strategy (“gso”) is equivalent to grow a tree on all combinations of row samples with column samples and their corresponding label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However, an optimized algorithm allows for these trees to exploit the bipartite dataset directly and grow much faster than a naive implementation of this idea.

    The splitting procedure of the global multi-output strategy (“gmo”), differently than the GSO adaptation, treats each sample on the other axis as a different output: hen splitting on y rows, each column is an output, when splitting on columns, each row is a different output.

    In other words, while GMO calculates impurity as a deviance metric (call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)), the GSO adaptation uses deviance relative to the total average label: (d(y[i, j], y.mean()).mean().

    The GMO strategy with single label average (“gmosa”) works exactly the same as explained for GMO, but assumes the tree’s output value will be y_leaf.mean(). Using “gmo” allows for combining each column or row of the leaf’s partition in multiple ways (see the prediction_weights parameter), but results in much larger memory usage, since all outputs must be stored. Using “gmosa” solves this memory issue by storing only the leaf total average in each node, at the cost of loosing the ability to specify prediction weights.

    See [1] for more information.

  • prediction_weights ({"uniform", "raw", "precomputed", "square", "softmax" }, 1D-array or callable, default="uniform") –

    Determines how to compute the final predicted value. Initially, all predictions for each row and column instance from the training set that share the leaf node with the predicting sample are obtained.

    • ”raw” instructs to return this vector, with a value for each training row and training column, and np.nan for instances not in the same leaf.

    • ”uniform” returns the mean value of the leaf for new instances but, for instances present in the training set, it uses the leaf’s row or column mean corresponding to it. Known instances are recognized by a a similarity value of 1. This is the main approach presented by [1], named global multi-output (GMO).

    Other options return the weighted average of the leaf values:

    • A 1D-array may be provided to specify training sample weights explicitly, with weights for training row samples followed by weights for training column samples (length==`sum(y_train.shape)`).

    • ”precomputed” instructs the estimator to consider x values as similarities to each row and column sample in the training set (row similarities followed by column similarities), using them as weights to average the leaf outputs.

    • A callable, if provided, takes all the X being predicted and must return an array of weights for each predicting sample with the same shape as the X array given to predict().

    • ”square” is equivalent to prediction_weights=np.square.

    • ”softmax” is equivalent to prediction_weights=np.exp.

feature_importances_#

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

max_row_features_#

The inferred value of max_row_features.

Type:

int

max_col_features_#

The inferred value of max_col_features.

Type:

int

n_features_in_#

Number of features seen during fit.

Type:

int

n_row_features_in_#

Number of row features seen during fit.

Type:

int

n_col_features_in_#

Number of column features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

BipartiteExtraTreeClassifier

An extremely randomized bipartite tree classifier.

BipartiteDecisionTreeRegressor

A bipartite decision tree regressor.

BipartiteDecisionTreeClassifier

A bipartite decision tree classifier.

bipartite_learn.ensemble.BipartiteExtraTreesClassifier

A bipartite extra-trees classifier.

bipartite_learn.ensemble.BipartiteExtraTreesRegressor

A bipartite extra-trees regressor.

bipartite_learn.ensemble.BipartiteRandomForestClassifier

A bipartite random forest classifier.

bipartite_learn.ensemble.BipartiteRandomForestRegressor

A bipartite random forest regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

class bipartite_learn.tree.DecisionTreeClassifierSS(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, supervision=0.5, ss_adapter=None, unsupervised_criterion='squared_error', update_supervision=None, preprocess_X_targets=None, _X_targets=None)#

Bases: BaseDecisionTreeSS, DecisionTreeClassifier

A decision tree classifier (semi-supervised version).

Read more in the User Guide.

Parameters:
  • criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon information gain, see tree_mathematical_formulation.

  • splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in version 0.18: Added float values for fractions.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in version 0.18: Added float values for fractions.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features (int, float or {"auto", "sqrt", "log2"}, default=None) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=sqrt(n_features).

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None or 1.0, then max_features=n_features.

    Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed in 1.3.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_features < n_features, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

    New in version 0.19.

  • class_weight (dict, list of dict or "balanced", default=None) –

    Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • ccp_alpha (non-negative float, default=0.0) –

    Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

    New in version 0.22.

classes_#

The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

Type:

ndarray of shape (n_classes,) or list of ndarray

feature_importances_#

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

max_features_#

The inferred value of max_features.

Type:

int

n_classes_#

The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).

Type:

int or list of int

n_features_in_#

Number of features seen during fit.

New in version 0.24.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

DecisionTreeRegressor

A decision tree regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

The predict() method operates using the numpy.argmax() function on the outputs of predict_proba(). This means that in case the highest predicted probabilities are tied, the classifier will predict the tied class with the lowest index in classes_.

References

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...                             
...
array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
        0.93...,  0.93...,  1.     ,  0.93...,  1.      ])
class bipartite_learn.tree.DecisionTreeRegressorSS(*, criterion='squared_error', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, ccp_alpha=0.0, supervision=0.5, ss_adapter=None, unsupervised_criterion='squared_error', update_supervision=None, preprocess_X_targets=None, _X_targets=None)#

Bases: BaseDecisionTreeSS, DecisionTreeRegressor

A decision tree regressor (semi-supervised version).

Read more in the User Guide.

Parameters:
  • criterion ({"squared_error", "friedman_mse", "absolute_error", "poisson"}, default="squared_error") –

    The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and minimizes the L2 loss using the mean of each terminal node, “friedman_mse”, which uses mean squared error with Friedman’s improvement score for potential splits, “absolute_error” for the mean absolute error, which minimizes the L1 loss using the median of each terminal node, and “poisson” which uses reduction in Poisson deviance to find splits.

    New in version 0.18: Mean Absolute Error (MAE) criterion.

    New in version 0.24: Poisson deviance criterion.

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.0 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

    Deprecated since version 1.0: Criterion “mae” was deprecated in v1.0 and will be removed in version 1.2. Use criterion=”absolute_error” which is equivalent.

  • splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in version 0.18: Added float values for fractions.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in version 0.18: Added float values for fractions.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features (int, float or {"auto", "sqrt", "log2"}, default=None) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=n_features.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None or 1.0, then max_features=n_features.

    Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed in 1.3.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always randomly permuted at each split, even if splitter is set to "best". When max_features < n_features, the algorithm will select max_features at random at each split before finding the best split among them. But the best found split may vary across different runs, even if max_features=n_features. That is the case, if the improvement of the criterion is identical for several splits and one split has to be selected at random. To obtain a deterministic behaviour during fitting, random_state has to be fixed to an integer. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

    New in version 0.19.

  • ccp_alpha (non-negative float, default=0.0) –

    Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

    New in version 0.22.

feature_importances_#

The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

max_features_#

The inferred value of max_features.

Type:

int

n_features_in_#

Number of features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

DecisionTreeClassifier

A decision tree classifier.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.model_selection import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> regressor = DecisionTreeRegressor(random_state=0)
>>> cross_val_score(regressor, X, y, cv=10)
...                    
...
array([-0.39..., -0.46...,  0.02...,  0.06..., -0.50...,
       0.16...,  0.11..., -0.73..., -0.30..., -0.00...])
class bipartite_learn.tree.ExtraTreeClassifierSS(*, criterion='gini', splitter='random', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='sqrt', random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight=None, ccp_alpha=0.0, supervision=0.5, ss_adapter=None, unsupervised_criterion='squared_error', update_supervision=None, preprocess_X_targets=None, _X_targets=None)#

Bases: DecisionTreeClassifierSS

An extremely randomized tree classifier (semi-supervised version).

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

Read more in the User Guide.

Parameters:
  • criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon information gain, see tree_mathematical_formulation.

  • splitter ({"random", "best"}, default="random") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in version 0.18: Added float values for fractions.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in version 0.18: Added float values for fractions.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features (int, float, {"auto", "sqrt", "log2"} or None, default="sqrt") –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=sqrt(n_features).

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

    Changed in version 1.1: The default of max_features changed from “auto” to “sqrt”.

    Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed in 1.3.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • random_state (int, RandomState instance or None, default=None) – Used to pick randomly the max_features used at each split. See Glossary for details.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

    New in version 0.19.

  • class_weight (dict, list of dict or "balanced", default=None) –

    Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

    Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

    The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

    For multi-output, the weights of each column of y will be multiplied.

    Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • ccp_alpha (non-negative float, default=0.0) –

    Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

    New in version 0.22.

classes_#

The classes labels (single output problem), or a list of arrays of class labels (multi-output problem).

Type:

ndarray of shape (n_classes,) or list of ndarray

max_features_#

The inferred value of max_features.

Type:

int

n_classes_#

The number of classes (for single output problems), or a list containing the number of classes for each output (for multi-output problems).

Type:

int or list of int

feature_importances_#

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

n_features_in_#

Number of features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

ExtraTreeRegressor

An extremely randomized tree regressor.

sklearn.ensemble.ExtraTreesClassifier

An extra-trees classifier.

sklearn.ensemble.ExtraTreesRegressor

An extra-trees regressor.

sklearn.ensemble.RandomForestClassifier

A random forest classifier.

sklearn.ensemble.RandomForestRegressor

A random forest regressor.

sklearn.ensemble.RandomTreesEmbedding

An ensemble of totally random trees.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

Examples

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.ensemble import BaggingClassifier
>>> from sklearn.tree import ExtraTreeClassifier
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
...    X, y, random_state=0)
>>> extra_tree = ExtraTreeClassifier(random_state=0)
>>> cls = BaggingClassifier(extra_tree, random_state=0).fit(
...    X_train, y_train)
>>> cls.score(X_test, y_test)
0.8947...
class bipartite_learn.tree.ExtraTreeRegressorSS(*, criterion='squared_error', splitter='random', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=1.0, random_state=None, min_impurity_decrease=0.0, max_leaf_nodes=None, ccp_alpha=0.0, supervision=0.5, ss_adapter=None, unsupervised_criterion='squared_error', update_supervision=None, preprocess_X_targets=None, _X_targets=None)#

Bases: DecisionTreeRegressorSS

An extremely randomized tree regressor (semi-supervised version).

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

Warning: Extra-trees should only be used within ensemble methods.

Read more in the User Guide.

Parameters:
  • criterion ({"squared_error", "friedman_mse"}, default="squared_error") –

    The function to measure the quality of a split. Supported criteria are “squared_error” for the mean squared error, which is equal to variance reduction as feature selection criterion and “mae” for the mean absolute error.

    New in version 0.18: Mean Absolute Error (MAE) criterion.

    New in version 0.24: Poisson deviance criterion.

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.0 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

    Deprecated since version 1.0: Criterion “mae” was deprecated in v1.0 and will be removed in version 1.2. Use criterion=”absolute_error” which is equivalent.

  • splitter ({"random", "best"}, default="random") – The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.

  • max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int or float, default=2) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

    Changed in version 0.18: Added float values for fractions.

  • min_samples_leaf (int or float, default=1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression.

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

    Changed in version 0.18: Added float values for fractions.

  • min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_features (int, float, {"auto", "sqrt", "log2"} or None, default=1.0) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=n_features.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

    Changed in version 1.1: The default of max_features changed from “auto” to 1.0.

    Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed in 1.3.

    Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • random_state (int, RandomState instance or None, default=None) – Used to pick randomly the max_features used at each split. See Glossary for details.

  • min_impurity_decrease (float, default=0.0) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

    The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child.

    N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

    New in version 0.19.

  • max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • ccp_alpha (non-negative float, default=0.0) –

    Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed. See minimal_cost_complexity_pruning for details.

    New in version 0.22.

max_features_#

The inferred value of max_features.

Type:

int

n_features_in_#

Number of features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

feature_importances_#

Return impurity-based feature importances (the higher, the more important the feature).

Warning: impurity-based feature importances can be misleading for high cardinality features (many unique values). See sklearn.inspection.permutation_importance() as an alternative.

Type:

ndarray of shape (n_features,)

n_outputs_#

The number of outputs when fit is performed.

Type:

int

tree_#

The underlying Tree object. Please refer to help(sklearn.tree._tree.Tree) for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.

Type:

Tree instance

See also

ExtraTreeClassifier

An extremely randomized tree classifier.

sklearn.ensemble.ExtraTreesClassifier

An extra-trees classifier.

sklearn.ensemble.ExtraTreesRegressor

An extra-trees regressor.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.

References

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.ensemble import BaggingRegressor
>>> from sklearn.tree import ExtraTreeRegressor
>>> X, y = load_diabetes(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, random_state=0)
>>> extra_tree = ExtraTreeRegressor(random_state=0)
>>> reg = BaggingRegressor(extra_tree, random_state=0).fit(
...     X_train, y_train)
>>> reg.score(X_test, y_test)
0.33...