Skip to content

[MRG] FIX use a stump as base estimator in RUSBoostClassifier #545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions doc/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,13 @@ Several methods taking advantage of boosting have been designed.
a boosting iteration [SKHN2010]_::

>>> from imblearn.ensemble import RUSBoostClassifier
>>> rusboost = RUSBoostClassifier(random_state=0)
>>> rusboost = RUSBoostClassifier(n_estimators=200, algorithm='SAMME.R',
... random_state=0)
>>> rusboost.fit(X_train, y_train) # doctest: +ELLIPSIS
RUSBoostClassifier(...)
>>> y_pred = rusboost.predict(X_test)
>>> balanced_accuracy_score(y_test, y_pred) # doctest: +ELLIPSIS
0.74...
0.66...

A specific method which uses ``AdaBoost`` as learners in the bagging classifier
is called EasyEnsemble. The :class:`EasyEnsembleClassifier` allows to bag
Expand Down
15 changes: 15 additions & 0 deletions doc/whats_new/v0.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ Version 0.5 (under development)
Changelog
---------

Changed models
..............

The following models or function might give different results even if the
same data ``X`` and ``y`` are the same.

* :class:`imblearn.ensemble.RUSBoostClassifier` default estimator changed from
:class:`sklearn.tree.DecisionTreeClassifier` with full depth to a decision
stump (i.e., tree with ``max_depth=1``).

Documentation
.............

Expand Down Expand Up @@ -53,3 +63,8 @@ Bug
- Fix bug in :class:`imblearn.pipeline.Pipeline` where None could be the final
estimator.
:pr:`554` by :user:`Oliver Rausch <orausch>`.

- Fix bug by changing the default depth in
:class:`imblearn.ensemble.RUSBoostClassifier` to get a decision stump as a
weak learner as in the original paper.
:pr:`545` by :user:`Christos Aridas <chkoar>`.
22 changes: 6 additions & 16 deletions imblearn/ensemble/_weight_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ class RUSBoostClassifier(AdaBoostClassifier):

Parameters
----------
base_estimator : object, optional (default=DecisionTreeClassifier)
base_estimator : object, optional (default=None)
The base estimator from which the boosted ensemble is built.
Support for sample weighting is required, as well as proper `classes_`
and `n_classes_` attributes.
Support for sample weighting is required, as well as proper
``classes_`` and ``n_classes_`` attributes. If ``None``, then
the base estimator is ``DecisionTreeClassifier(max_depth=1)``

n_estimators : integer, optional (default=50)
The maximum number of estimators at which boosting is terminated.
Expand Down Expand Up @@ -151,21 +152,10 @@ def fit(self, X, y, sample_weight=None):
super().fit(X, y, sample_weight)
return self

def _validate_estimator(self, default=DecisionTreeClassifier()):
def _validate_estimator(self):
"""Check the estimator and the n_estimator attribute, set the
`base_estimator_` attribute."""
if not isinstance(self.n_estimators, (numbers.Integral, np.integer)):
raise ValueError("n_estimators must be an integer, "
"got {}.".format(type(self.n_estimators)))

if self.n_estimators <= 0:
raise ValueError("n_estimators must be greater than zero, "
"got {}.".format(self.n_estimators))

if self.base_estimator is not None:
self.base_estimator_ = clone(self.base_estimator)
else:
self.base_estimator_ = clone(default)
super()._validate_estimator()

self.base_sampler_ = RandomUnderSampler(
sampling_strategy=self.sampling_strategy,
Expand Down
6 changes: 4 additions & 2 deletions imblearn/ensemble/tests/test_weight_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

@pytest.fixture
def imbalanced_dataset():
return make_classification(n_samples=10000, n_features=2, n_informative=2,
return make_classification(n_samples=10000, n_features=3, n_informative=2,
n_redundant=0, n_repeated=0, n_classes=3,
n_clusters_per_class=1,
weights=[0.01, 0.05, 0.94], class_sep=0.8,
Expand All @@ -33,7 +33,9 @@ def test_balanced_random_forest_error(imbalanced_dataset, boosting_params,
@pytest.mark.parametrize('algorithm', ['SAMME', 'SAMME.R'])
def test_rusboost(imbalanced_dataset, algorithm):
X, y = imbalanced_dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)
X_train, X_test, y_train, y_test = train_test_split(X, y,
stratify=y,
random_state=1)
classes = np.unique(y)

n_estimators = 500
Expand Down