Decision tree classifier tailored to bipartite input.
Implements optimized global single output (GSO) and multi-output (GMO)
trees for interaction prediction. The latter is proposed by [1] under the
name of Predictive Bi-Clustering Trees. The former implements an optimzied
algorithm for growing GSO trees, which consider concatenated pairs of row
and column instances in a bipartite dataset as the actual intances.
GSO trees (bipartite_adapter=”gso”) will yield the exactly
same tree structure as if all possible combinations of row and column
instances were provided to a usual sklearn.DecisionTreeRegressor, but
in much sorter time.
See [1] and [2] and also the User Guide. (TODO)
Parameters:
criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are
“gini” for the Gini impurity and “log_loss” and “entropy” both for the
Shannon information gain, see tree_mathematical_formulation.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node.
Analogous to min_samples_leaf.
min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node.
Analogous to min_samples_leaf.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all
the input rows) required to be at a leaf node. Rows have
equal weight when sample_weight is not provided.
min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of
all the input columns) required to be at a leaf node. Columns have
equal weight when sample_weight is not provided.
max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_row_features features at each row split.
If float, then max_row_features is a fraction and
int(max_row_features * n_row_features) features are considered at
each split.
If “auto”, then max_row_features=n_row_features.
If “sqrt”, then max_row_features=sqrt(n_row_features).
If “log2”, then max_row_features=log2(n_row_features).
If None or 1.0, then max_row_features=n_row_features.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_row_features row features.
max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.
random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always
randomly permuted at each split, even if splitter is set to
"best". When max_(row/col)_features<n_(row/col)_features,
the algorithm will select max_(row/col)_features at random at each
split before finding the best split among them. But the best found
split may vary across different runs, even if
max_(row/col)_features=n_(row/col)_features. That is the case, if
the improvement of the criterion is identical for several splits and
one split has to be selected at random. To obtain a deterministic
behaviour during fitting, random_state has to be fixed to an
integer. See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
Which strategy to employ when searching for the best split. The global
single-output strategy (“gso”) is equivalent to grow a tree on all
combinations of row samples with column samples and their corresponding
label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However,
an optimized algorithm allows for these trees to exploit the bipartite
dataset directly and grow much faster than a naive implementation of
this idea.
The splitting procedure of the global multi-output strategy (“gmo”),
differently than the GSO adaptation, treats each sample on the other
axis as a different output: hen splitting on y rows, each column is an
output, when splitting on columns, each row is a different output.
In other words, while GMO calculates impurity as a deviance metric
(call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)),
the GSO adaptation uses deviance relative to the total average label:
(d(y[i, j], y.mean()).mean().
The GMO strategy with single label average (“gmosa”) works exactly the
same as explained for GMO, but assumes the tree’s output value will be
y_leaf.mean(). Using “gmo” allows for combining each column or row of
the leaf’s partition in multiple ways (see the prediction_weights
parameter), but results in much larger memory usage, since all outputs
must be stored. Using “gmosa” solves this memory issue by storing only
the leaf total average in each node, at the cost of loosing the ability
to specify prediction weights.
Determines how to compute the final predicted value. Initially, all
predictions for each row and column instance from the training set that
share the leaf node with the predicting sample are obtained.
”raw” instructs to return this vector, with a value for each training
row and training column, and np.nan for instances not in the same
leaf.
”uniform” returns the mean value of the leaf for new instances but,
for instances present in the training set, it uses the leaf’s row or
column mean corresponding to it. Known instances are recognized by a
a similarity value of 1. This is the main approach presented by [1],
named global multi-output (GMO).
Other options return the weighted average of the leaf values:
A 1D-array may be provided to specify training sample weights
explicitly, with weights for training row samples followed by weights
for training column samples (length==`sum(y_train.shape)`).
”precomputed” instructs the estimator to consider x values as
similarities to each row and column sample in the training set (row
similarities followed by column similarities), using them as
weights to average the leaf outputs.
A callable, if provided, takes all the X being predicted and must
return an array of weights for each predicting sample with the same
shape as the X array given to predict().
”square” is equivalent to prediction_weights=np.square.
”softmax” is equivalent to prediction_weights=np.exp.
The feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the
(normalized) total reduction of the criterion brought
by that feature. It is also known as the Gini importance [4]_.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
Build a decision tree classifier from the training set (X, y).
:param X: The training input samples. Internally, it will be converted to
dtype=np.float32 and if a sparse matrix is provided
to a sparse csc_matrix.
Parameters:
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The target values (class labels) as integers or strings.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Splits
that would create child nodes with net zero or negative weight are
ignored while searching for a split in each node. Splits are also
ignored if they would result in any single class carrying a
negative weight in either child node.
check_input (bool, default=True) – Allow to bypass several input checking.
Don’t use this parameter unless you know what you’re doing.
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:param X: The input samples. Internally, it will be converted to
dtype=np.float32 and if a sparse matrix is provided
to a sparse csr_matrix.
Parameters:
check_input (bool, default=True) – Allow to bypass several input checking.
Don’t use this parameter unless you know what you’re doing.
Returns:
proba – The class probabilities of the input samples. The order of the
classes corresponds to that in the attribute classes_.
Return type:
ndarray of shape (n_samples, n_classes) or list of n_outputs such arrays if n_outputs > 1
Decision tree regressor tailored to bipartite input.
Implements optimized global single output (GSO) and multi-output (GMO)
trees for interaction prediction. The latter is proposed by [1] under the
name of Predictive Bi-Clustering Trees. The former implements an optimzied
algorithm for growing GSO trees, which consider concatenated pairs of row
and column instances in a bipartite dataset as the actual intances.
GSO trees (bipartite_adapter=”gso”) will yield the exactly
same tree structure as if all possible combinations of row and column
instances were provided to a usual sklearn.DecisionTreeRegressor, but
in much sorter time.
See [1] and [2] and also the User Guide. (TODO)
Parameters:
criterion ({"squared_error", "friedman_mse"}, default="squared_error") – The function to measure the quality of a split. Supported criteria
are “squared_error” for the mean squared error, which is equal to
variance reduction as feature selection criterion and minimizes the L2
loss using the mean of each terminal node and “friedman_mse”, which uses
mean squared error with Friedman’s improvement score for potential
splits.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node.
Analogous to min_samples_leaf.
min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node.
Analogous to min_samples_leaf.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all
the input rows) required to be at a leaf node. Rows have
equal weight when sample_weight is not provided.
min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of
all the input columns) required to be at a leaf node. Columns have
equal weight when sample_weight is not provided.
max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_row_features features at each row split.
If float, then max_row_features is a fraction and
int(max_row_features * n_row_features) features are considered at
each split.
If “auto”, then max_row_features=n_row_features.
If “sqrt”, then max_row_features=sqrt(n_row_features).
If “log2”, then max_row_features=log2(n_row_features).
If None or 1.0, then max_row_features=n_row_features.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_row_features row features.
max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.
random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always
randomly permuted at each split, even if splitter is set to
"best". When max_(row/col)_features<n_(row/col)_features,
the algorithm will select max_(row/col)_features at random at each
split before finding the best split among them. But the best found
split may vary across different runs, even if
max_(row/col)_features=n_(row/col)_features. That is the case, if
the improvement of the criterion is identical for several splits and
one split has to be selected at random. To obtain a deterministic
behaviour during fitting, random_state has to be fixed to an
integer. See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
Which strategy to employ when searching for the best split. The global
single-output strategy (“gso”) is equivalent to grow a tree on all
combinations of row samples with column samples and their corresponding
label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However,
an optimized algorithm allows for these trees to exploit the bipartite
dataset directly and grow much faster than a naive implementation of
this idea.
The splitting procedure of the global multi-output strategy (“gmo”),
differently than the GSO adaptation, treats each sample on the other
axis as a different output: hen splitting on y rows, each column is an
output, when splitting on columns, each row is a different output.
In other words, while GMO calculates impurity as a deviance metric
(call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)),
the GSO adaptation uses deviance relative to the total average label:
(d(y[i, j], y.mean()).mean().
The GMO strategy with single label average (“gmosa”) works exactly the
same as explained for GMO, but assumes the tree’s output value will be
y_leaf.mean(). Using “gmo” allows for combining each column or row of
the leaf’s partition in multiple ways (see the prediction_weights
parameter), but results in much larger memory usage, since all outputs
must be stored. Using “gmosa” solves this memory issue by storing only
the leaf total average in each node, at the cost of loosing the ability
to specify prediction weights.
Determines how to compute the final predicted value. Initially, all
predictions for each row and column instance from the training set that
share the leaf node with the predicting sample are obtained.
”raw” instructs to return this vector, with a value for each training
row and training column, and np.nan for instances not in the same
leaf.
”uniform” returns the mean value of the leaf for new instances but,
for instances present in the training set, it uses the leaf’s row or
column mean corresponding to it. Known instances are recognized by a
a similarity value of 1. This is the main approach presented by [1],
named global multi-output (GMO).
Other options return the weighted average of the leaf values:
A 1D-array may be provided to specify training sample weights
explicitly, with weights for training row samples followed by weights
for training column samples (length==`sum(y_train.shape)`).
”precomputed” instructs the estimator to consider x values as
similarities to each row and column sample in the training set (row
similarities followed by column similarities), using them as
weights to average the leaf outputs.
A callable, if provided, takes all the X being predicted and must
return an array of weights for each predicting sample with the same
shape as the X array given to predict().
”square” is equivalent to prediction_weights=np.square.
”softmax” is equivalent to prediction_weights=np.exp.
The feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the
(normalized) total reduction of the criterion brought
by that feature. It is also known as the Gini importance [4]_.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
Build a decision tree regressor from the training set (X, y).
Parameters:
X (list-like of {array-like, sparse matrix} of shapes (n_axis_samples,) – n_axis_features).
The training input samples for each axis. Internally, it will be
converted to dtype=np.float32 and if a sparse matrix is provided
to a sparse csc_matrix.
y (array-like of shape (n_row_samples, n_col_samples)) – The target values (real . Use dtype=np.float64 and
order='C' for maximum efficiency.
sample_weight (array-like of shape (n_row_samples+n_col_samples,),) – default=None
Sample weights. If None, then samples are equally weighted. Splits
that would create child nodes with net zero or negative weight are
ignored while searching for a split in each node.
Row sample weights and column sample weights must be provided in
one concatenated array.
check_input (bool, default=True) – Allow to bypass several input checking.
Don’t use this parameter unless you know what you do.
Extremely randomized tree classifier tailored to bipartite input.
Implements optimized global single output (GSO) and multi-output (GMO)
trees for interaction prediction. The latter is proposed by [1] under the
name of Predictive Bi-Clustering Trees. The former implements an optimzied
algorithm for growing GSO trees, which consider concatenated pairs of row
and column instances in a bipartite dataset as the actual intances.
GSO trees (bipartite_adapter=”gso”) will yield the exactly
same tree structure as if all possible combinations of row and column
instances were provided to a usual sklearn.DecisionTreeRegressor, but
in much sorter time.
See [1] and [2] and also the User Guide. (TODO)
Parameters:
criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are
“gini” for the Gini impurity and “log_loss” and “entropy” both for the
Shannon information gain, see tree_mathematical_formulation.
splitter ({"best", "random"}, default="random") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node.
Analogous to min_samples_leaf.
min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node.
Analogous to min_samples_leaf.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all
the input rows) required to be at a leaf node. Rows have
equal weight when sample_weight is not provided.
min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of
all the input columns) required to be at a leaf node. Columns have
equal weight when sample_weight is not provided.
max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_row_features features at each row split.
If float, then max_row_features is a fraction and
int(max_row_features * n_row_features) features are considered at
each split.
If “auto”, then max_row_features=n_row_features.
If “sqrt”, then max_row_features=sqrt(n_row_features).
If “log2”, then max_row_features=log2(n_row_features).
If None or 1.0, then max_row_features=n_row_features.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_row_features row features.
max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.
random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always
randomly permuted at each split, even if splitter is set to
"best". When max_(row/col)_features<n_(row/col)_features,
the algorithm will select max_(row/col)_features at random at each
split before finding the best split among them. But the best found
split may vary across different runs, even if
max_(row/col)_features=n_(row/col)_features. That is the case, if
the improvement of the criterion is identical for several splits and
one split has to be selected at random. To obtain a deterministic
behaviour during fitting, random_state has to be fixed to an
integer. See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
Which strategy to employ when searching for the best split. The global
single-output strategy (“gso”) is equivalent to grow a tree on all
combinations of row samples with column samples and their corresponding
label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However,
an optimized algorithm allows for these trees to exploit the bipartite
dataset directly and grow much faster than a naive implementation of
this idea.
The splitting procedure of the global multi-output strategy (“gmo”),
differently than the GSO adaptation, treats each sample on the other
axis as a different output: hen splitting on y rows, each column is an
output, when splitting on columns, each row is a different output.
In other words, while GMO calculates impurity as a deviance metric
(call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)),
the GSO adaptation uses deviance relative to the total average label:
(d(y[i, j], y.mean()).mean().
The GMO strategy with single label average (“gmosa”) works exactly the
same as explained for GMO, but assumes the tree’s output value will be
y_leaf.mean(). Using “gmo” allows for combining each column or row of
the leaf’s partition in multiple ways (see the prediction_weights
parameter), but results in much larger memory usage, since all outputs
must be stored. Using “gmosa” solves this memory issue by storing only
the leaf total average in each node, at the cost of loosing the ability
to specify prediction weights.
Determines how to compute the final predicted value. Initially, all
predictions for each row and column instance from the training set that
share the leaf node with the predicting sample are obtained.
”raw” instructs to return this vector, with a value for each training
row and training column, and np.nan for instances not in the same
leaf.
”uniform” returns the mean value of the leaf for new instances but,
for instances present in the training set, it uses the leaf’s row or
column mean corresponding to it. Known instances are recognized by a
a similarity value of 1. This is the main approach presented by [1],
named global multi-output (GMO).
Other options return the weighted average of the leaf values:
A 1D-array may be provided to specify training sample weights
explicitly, with weights for training row samples followed by weights
for training column samples (length==`sum(y_train.shape)`).
”precomputed” instructs the estimator to consider x values as
similarities to each row and column sample in the training set (row
similarities followed by column similarities), using them as
weights to average the leaf outputs.
A callable, if provided, takes all the X being predicted and must
return an array of weights for each predicting sample with the same
shape as the X array given to predict().
”square” is equivalent to prediction_weights=np.square.
”softmax” is equivalent to prediction_weights=np.exp.
The feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the
(normalized) total reduction of the criterion brought
by that feature. It is also known as the Gini importance [4]_.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
Extremely randomized trees tailored to bipartite input.
Implements optimized global single output (GSO) and multi-output (GMO)
trees for interaction prediction. The latter is proposed by [1] under the
name of Predictive Bi-Clustering Trees. The former implements an optimzied
algorithm for growing GSO trees, which consider concatenated pairs of row
and column instances in a bipartite dataset as the actual intances.
GSO trees (bipartite_adapter=”gso”) will yield the exactly
same tree structure as if all possible combinations of row and column
instances were provided to a usual sklearn.DecisionTreeRegressor, but
in much sorter time.
See [1] and [2] and also the User Guide. (TODO)
Parameters:
criterion ({"squared_error", "friedman_mse"}, default="squared_error") – The function to measure the quality of a split. Supported criteria
are “squared_error” for the mean squared error, which is equal to
variance reduction as feature selection criterion and minimizes the L2
loss using the mean of each terminal node and “friedman_mse”, which uses
mean squared error with Friedman’s improvement score for potential
splits.
splitter ({"best", "random"}, default="random") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
min_rows_split (int or float, default=1) – The minimum number of row samples required to split an internal node.
Analogous to min_samples_leaf.
min_cols_split (int or float, default=1) – The minimum number of column samples required to split an internal node.
Analogous to min_samples_leaf.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
min_rows_leaf (int or float, default=1) – The minimum number of row samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_cols_leaf (int or float, default=1) – The minimum number of column samples required to be at a leaf node.
Analogous to min_samples_leaf.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
min_row_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of row weights (of all
the input rows) required to be at a leaf node. Rows have
equal weight when sample_weight is not provided.
min_col_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of column weights (of
all the input columns) required to be at a leaf node. Columns have
equal weight when sample_weight is not provided.
max_row_features (int, float or {"auto", "sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_row_features features at each row split.
If float, then max_row_features is a fraction and
int(max_row_features * n_row_features) features are considered at
each split.
If “auto”, then max_row_features=n_row_features.
If “sqrt”, then max_row_features=sqrt(n_row_features).
If “log2”, then max_row_features=log2(n_row_features).
If None or 1.0, then max_row_features=n_row_features.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_row_features row features.
max_col_features (int, float or {"auto", "sqrt", "log2"}, default=None) – Analogous to max_row_features, but for column features.
random_state (int, RandomState instance or None, default=None) – Used to pick randomly the max_row_features and max_col_features used at
each split. See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
ccp_alpha (non-negative float, default=0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
Which strategy to employ when searching for the best split. The global
single-output strategy (“gso”) is equivalent to grow a tree on all
combinations of row samples with column samples and their corresponding
label: x = [*X[0][i], *X[1][j]], y = Y[i, j] for all i and j. However,
an optimized algorithm allows for these trees to exploit the bipartite
dataset directly and grow much faster than a naive implementation of
this idea.
The splitting procedure of the global multi-output strategy (“gmo”),
differently than the GSO adaptation, treats each sample on the other
axis as a different output: hen splitting on y rows, each column is an
output, when splitting on columns, each row is a different output.
In other words, while GMO calculates impurity as a deviance metric
(call it d) relative to each output’s mean ((d(y[i, j], y.mean(0)).mean(0)),
the GSO adaptation uses deviance relative to the total average label:
(d(y[i, j], y.mean()).mean().
The GMO strategy with single label average (“gmosa”) works exactly the
same as explained for GMO, but assumes the tree’s output value will be
y_leaf.mean(). Using “gmo” allows for combining each column or row of
the leaf’s partition in multiple ways (see the prediction_weights
parameter), but results in much larger memory usage, since all outputs
must be stored. Using “gmosa” solves this memory issue by storing only
the leaf total average in each node, at the cost of loosing the ability
to specify prediction weights.
Determines how to compute the final predicted value. Initially, all
predictions for each row and column instance from the training set that
share the leaf node with the predicting sample are obtained.
”raw” instructs to return this vector, with a value for each training
row and training column, and np.nan for instances not in the same
leaf.
”uniform” returns the mean value of the leaf for new instances but,
for instances present in the training set, it uses the leaf’s row or
column mean corresponding to it. Known instances are recognized by a
a similarity value of 1. This is the main approach presented by [1],
named global multi-output (GMO).
Other options return the weighted average of the leaf values:
A 1D-array may be provided to specify training sample weights
explicitly, with weights for training row samples followed by weights
for training column samples (length==`sum(y_train.shape)`).
”precomputed” instructs the estimator to consider x values as
similarities to each row and column sample in the training set (row
similarities followed by column similarities), using them as
weights to average the leaf outputs.
A callable, if provided, takes all the X being predicted and must
return an array of weights for each predicting sample with the same
shape as the X array given to predict().
”square” is equivalent to prediction_weights=np.square.
”softmax” is equivalent to prediction_weights=np.exp.
The feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the
(normalized) total reduction of the criterion brought
by that feature. It is also known as the Gini importance [4]_.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
A decision tree classifier (semi-supervised version).
Read more in the User Guide.
Parameters:
criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are
“gini” for the Gini impurity and “log_loss” and “entropy” both for the
Shannon information gain, see tree_mathematical_formulation.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
Changed in version 0.18: Added float values for fractions.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
Changed in version 0.18: Added float values for fractions.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
max_features (int, float or {"auto", "sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and
int(max_features * n_features) features are considered at each
split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None or 1.0, then max_features=n_features.
Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed
in 1.3.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features features.
random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always
randomly permuted at each split, even if splitter is set to
"best". When max_features<n_features, the algorithm will
select max_features at random at each split before finding the best
split among them. But the best found split may vary across different
runs, even if max_features=n_features. That is the case, if the
improvement of the criterion is identical for several splits and one
split has to be selected at random. To obtain a deterministic behaviour
during fitting, random_state has to be fixed to an integer.
See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
New in version 0.19.
class_weight (dict, list of dict or "balanced", default=None) –
Weights associated with classes in the form {class_label:weight}.
If None, all classes are supposed to have weight one. For
multi-output problems, a list of dicts can be provided in the same
order as the columns of y.
Note that for multioutput (including multilabel) weights should be
defined for each class of every column in its own dict. For example,
for four-class multilabel classification weights should be
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
[{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as n_samples/(n_classes*np.bincount(y))
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.
ccp_alpha (non-negative float, default=0.0) –
Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
The impurity-based feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the (normalized)
total reduction of the criterion brought by that feature. It is also
known as the Gini importance [4]_.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
Type:
Tree instance
See also
DecisionTreeRegressor
A decision tree regressor.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The predict() method operates using the numpy.argmax()
function on the outputs of predict_proba(). This means that in
case the highest predicted probabilities are tied, the classifier will
predict the tied class with the lowest index in classes_.
The function to measure the quality of a split. Supported criteria
are “squared_error” for the mean squared error, which is equal to
variance reduction as feature selection criterion and minimizes the L2
loss using the mean of each terminal node, “friedman_mse”, which uses
mean squared error with Friedman’s improvement score for potential
splits, “absolute_error” for the mean absolute error, which minimizes
the L1 loss using the median of each terminal node, and “poisson” which
uses reduction in Poisson deviance to find splits.
New in version 0.18: Mean Absolute Error (MAE) criterion.
New in version 0.24: Poisson deviance criterion.
Deprecated since version 1.0: Criterion “mse” was deprecated in v1.0 and will be removed in
version 1.2. Use criterion=”squared_error” which is equivalent.
Deprecated since version 1.0: Criterion “mae” was deprecated in v1.0 and will be removed in
version 1.2. Use criterion=”absolute_error” which is equivalent.
splitter ({"best", "random"}, default="best") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
Changed in version 0.18: Added float values for fractions.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
Changed in version 0.18: Added float values for fractions.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
max_features (int, float or {"auto", "sqrt", "log2"}, default=None) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and
int(max_features * n_features) features are considered at each
split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None or 1.0, then max_features=n_features.
Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed
in 1.3.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features features.
random_state (int, RandomState instance or None, default=None) – Controls the randomness of the estimator. The features are always
randomly permuted at each split, even if splitter is set to
"best". When max_features<n_features, the algorithm will
select max_features at random at each split before finding the best
split among them. But the best found split may vary across different
runs, even if max_features=n_features. That is the case, if the
improvement of the criterion is identical for several splits and one
split has to be selected at random. To obtain a deterministic behaviour
during fitting, random_state has to be fixed to an integer.
See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
New in version 0.19.
ccp_alpha (non-negative float, default=0.0) –
Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
The feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the
(normalized) total reduction of the criterion brought
by that feature. It is also known as the Gini importance [4]_.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
Type:
Tree instance
See also
DecisionTreeClassifier
A decision tree classifier.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
An extremely randomized tree classifier (semi-supervised version).
Extra-trees differ from classic decision trees in the way they are built.
When looking for the best split to separate the samples of a node into two
groups, random splits are drawn for each of the max_features randomly
selected features and the best split among those is chosen. When
max_features is set 1, this amounts to building a totally random
decision tree.
Warning: Extra-trees should only be used within ensemble methods.
Read more in the User Guide.
Parameters:
criterion ({"gini", "entropy", "log_loss"}, default="gini") – The function to measure the quality of a split. Supported criteria are
“gini” for the Gini impurity and “log_loss” and “entropy” both for the
Shannon information gain, see tree_mathematical_formulation.
splitter ({"random", "best"}, default="random") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
Changed in version 0.18: Added float values for fractions.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
Changed in version 0.18: Added float values for fractions.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
max_features (int, float, {"auto", "sqrt", "log2"} or None, default="sqrt") –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and
int(max_features * n_features) features are considered at each
split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
Changed in version 1.1: The default of max_features changed from “auto” to “sqrt”.
Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed
in 1.3.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features features.
random_state (int, RandomState instance or None, default=None) – Used to pick randomly the max_features used at each split.
See Glossary for details.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
New in version 0.19.
class_weight (dict, list of dict or "balanced", default=None) –
Weights associated with classes in the form {class_label:weight}.
If None, all classes are supposed to have weight one. For
multi-output problems, a list of dicts can be provided in the same
order as the columns of y.
Note that for multioutput (including multilabel) weights should be
defined for each class of every column in its own dict. For example,
for four-class multilabel classification weights should be
[{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of
[{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as n_samples/(n_classes*np.bincount(y))
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.
ccp_alpha (non-negative float, default=0.0) –
Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
The impurity-based feature importances.
The higher, the more important the feature.
The importance of a feature is computed as the (normalized)
total reduction of the criterion brought by that feature. It is also
known as the Gini importance.
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
Type:
Tree instance
See also
ExtraTreeRegressor
An extremely randomized tree regressor.
sklearn.ensemble.ExtraTreesClassifier
An extra-trees classifier.
sklearn.ensemble.ExtraTreesRegressor
An extra-trees regressor.
sklearn.ensemble.RandomForestClassifier
A random forest classifier.
sklearn.ensemble.RandomForestRegressor
A random forest regressor.
sklearn.ensemble.RandomTreesEmbedding
An ensemble of totally random trees.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
An extremely randomized tree regressor (semi-supervised version).
Extra-trees differ from classic decision trees in the way they are built.
When looking for the best split to separate the samples of a node into two
groups, random splits are drawn for each of the max_features randomly
selected features and the best split among those is chosen. When
max_features is set 1, this amounts to building a totally random
decision tree.
Warning: Extra-trees should only be used within ensemble methods.
The function to measure the quality of a split. Supported criteria
are “squared_error” for the mean squared error, which is equal to
variance reduction as feature selection criterion and “mae” for the
mean absolute error.
New in version 0.18: Mean Absolute Error (MAE) criterion.
New in version 0.24: Poisson deviance criterion.
Deprecated since version 1.0: Criterion “mse” was deprecated in v1.0 and will be removed in
version 1.2. Use criterion=”squared_error” which is equivalent.
Deprecated since version 1.0: Criterion “mae” was deprecated in v1.0 and will be removed in
version 1.2. Use criterion=”absolute_error” which is equivalent.
splitter ({"random", "best"}, default="random") – The strategy used to choose the split at each node. Supported
strategies are “best” to choose the best split and “random” to choose
the best random split.
max_depth (int, default=None) – The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.
min_samples_split (int or float, default=2) –
The minimum number of samples required to split an internal node:
If int, then consider min_samples_split as the minimum number.
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.
Changed in version 0.18: Added float values for fractions.
min_samples_leaf (int or float, default=1) –
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it leaves at
least min_samples_leaf training samples in each of the left and
right branches. This may have the effect of smoothing the model,
especially in regression.
If int, then consider min_samples_leaf as the minimum number.
If float, then min_samples_leaf is a fraction and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.
Changed in version 0.18: Added float values for fractions.
min_weight_fraction_leaf (float, default=0.0) – The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.
max_features (int, float, {"auto", "sqrt", "log2"} or None, default=1.0) –
The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and
int(max_features * n_features) features are considered at each
split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
Changed in version 1.1: The default of max_features changed from “auto” to 1.0.
Deprecated since version 1.1: The “auto” option was deprecated in 1.1 and will be removed
in 1.3.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features features.
random_state (int, RandomState instance or None, default=None) – Used to pick randomly the max_features used at each split.
See Glossary for details.
min_impurity_decrease (float, default=0.0) –
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following:
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.
New in version 0.19.
max_leaf_nodes (int, default=None) – Grow a tree with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
ccp_alpha (non-negative float, default=0.0) –
Complexity parameter used for Minimal Cost-Complexity Pruning. The
subtree with the largest cost complexity that is smaller than
ccp_alpha will be chosen. By default, no pruning is performed. See
minimal_cost_complexity_pruning for details.
Return impurity-based feature importances (the higher, the more
important the feature).
Warning: impurity-based feature importances can be misleading for
high cardinality features (many unique values). See
sklearn.inspection.permutation_importance() as an alternative.
The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree) for attributes of Tree object and
sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py
for basic usage of these attributes.
Type:
Tree instance
See also
ExtraTreeClassifier
An extremely randomized tree classifier.
sklearn.ensemble.ExtraTreesClassifier
An extra-trees classifier.
sklearn.ensemble.ExtraTreesRegressor
An extra-trees regressor.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.