Skip to content

Commit 2f0a4b2

Browse files
committed
EHN: Implementation of BalancedRandomForestClassifier (#459)
1 parent 839df67 commit 2f0a4b2

File tree

13 files changed

+847
-154
lines changed

13 files changed

+847
-154
lines changed

README.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,9 +159,11 @@ Below is a list of the methods currently implemented in this module.
159159
1. SMOTE + Tomek links [12]_
160160
2. SMOTE + ENN [11]_
161161

162-
* Ensemble sampling
162+
* Ensemble classifier using samplers internally
163163
1. EasyEnsemble [13]_
164164
2. BalanceCascade [13]_
165+
3. Balanced Random Forest [16]_
166+
4. Balanced Bagging
165167

166168
The different algorithms are presented in the sphinx-gallery_.
167169

@@ -200,3 +202,5 @@ References:
200202
.. [14] : I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, 1976. [`bib <references.bib#L158>`_]
201203
202204
.. [15] : H. He, Y. Bai, E. A. Garcia, S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In Proceedings of the 5th IEEE International Joint Conference on Neural Networks, pp. 1322-1328, 2008. [`pdf <https://pdfs.semanticscholar.org/4823/4756b7cf798bfeb47328f7c5d597fd4838c2.pdf>`_] [`bib <references.bib#L62>`_]
205+
206+
.. [16] : C. Chao, A. Liaw, and L. Breiman. "Using random forest to learn imbalanced data." University of California, Berkeley 110 (2004): 1-12.

doc/api.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ Prototype selection
109109

110110
ensemble.BalanceCascade
111111
ensemble.BalancedBaggingClassifier
112+
ensemble.BalancedRandomForestClassifier
112113
ensemble.EasyEnsemble
113114
ensemble.EasyEnsembleClassifier
114115

doc/ensemble.rst

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -116,24 +116,24 @@ random under-sampler::
116116
[ 0, 55, 4],
117117
[ 42, 46, 1091]])
118118

119-
It also possible to turn a balanced bagging classifier into a balanced random
120-
forest using a decision tree classifier and setting the parameter
121-
``max_features='auto'``. It allows to randomly select a subset of features for
122-
each tree::
123-
124-
>>> brf = BalancedBaggingClassifier(
125-
... base_estimator=DecisionTreeClassifier(max_features='auto'),
126-
... random_state=0)
119+
:class:`BalancedRandomForestClassifier` is another ensemble method in which
120+
each tree of the forest will be provided a balanced boostrap sample. This class
121+
provides all functionality of the
122+
:class:`sklearn.ensemble.RandomForestClassifier` and notably the
123+
`feature_importances_` attributes::
124+
125+
126+
>>> from imblearn.ensemble import BalancedRandomForestClassifier
127+
>>> brf = BalancedRandomForestClassifier(n_estimators=10, random_state=0)
127128
>>> brf.fit(X_train, y_train) # doctest: +ELLIPSIS
128-
BalancedBaggingClassifier(...)
129+
BalancedRandomForestClassifier(...)
129130
>>> y_pred = brf.predict(X_test)
130131
>>> confusion_matrix(y_test, y_pred)
131132
array([[ 9, 1, 2],
132-
[ 0, 54, 5],
133-
[ 31, 34, 1114]])
134-
135-
See
136-
:ref:`sphx_glr_auto_examples_ensemble_plot_comparison_bagging_classifier.py`.
133+
[ 3, 54, 2],
134+
[ 113, 47, 1019]])
135+
>>> brf.feature_importances_
136+
array([ 0.63501243, 0.36498757])
137137

138138
A specific method which uses ``AdaBoost`` as learners in the bagging
139139
classifier is called EasyEnsemble. The :class:`EasyEnsembleClassifier` allows
@@ -149,4 +149,7 @@ the ensemble as::
149149
>>> confusion_matrix(y_test, y_pred)
150150
array([[ 9, 1, 2],
151151
[ 5, 52, 2],
152-
[252, 45, 882]])
152+
[252, 45, 882]])
153+
154+
See
155+
:ref:`sphx_glr_auto_examples_ensemble_plot_comparison_ensemble_classifier.py`.

doc/whats_new/v0.0.4.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ New features
3333
AdaBoost classifier trained on balanced bootstrap samples.
3434
:issue:`455` by :user:`Guillaume Lemaitre <glemaitre>`.
3535

36+
- Add :class:`imblearn.ensemble.BalancedRandomForestClassifier` which balanced
37+
each bootstrap provided to each tree of the forest.
38+
:issue:`459` by :user:`Guillaume Lemaitre <glemaitre>`.
39+
3640
Enhancement
3741
...........
3842

examples/ensemble/plot_comparison_bagging_classifier.py

Lines changed: 0 additions & 124 deletions
This file was deleted.

0 commit comments

Comments
 (0)