Skip to content

Commit eafae67

Browse files
authored
EHN: split and factorize SMOTE classes (#440)
1 parent 2fed48f commit eafae67

File tree

8 files changed

+667
-404
lines changed

8 files changed

+667
-404
lines changed

doc/over_sampling.rst

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -127,11 +127,11 @@ nearest neighbors class. Those variants are presented in the figure below.
127127
:align: center
128128

129129

130-
The parameter ``kind`` is controlling this feature and the following types are
131-
available: (i) ``'borderline1'``, (ii) ``'borderline2'``, and (iii) ``'svm'``::
130+
The :class:`BorderlineSMOTE` and :class:`SVMSMOTE` offer some variant of the SMOTE
131+
algorithm::
132132

133-
>>> from imblearn.over_sampling import SMOTE, ADASYN
134-
>>> X_resampled, y_resampled = SMOTE(kind='borderline1').fit_sample(X, y)
133+
>>> from imblearn.over_sampling import BorderlineSMOTE
134+
>>> X_resampled, y_resampled = BorderlineSMOTE().fit_sample(X, y)
135135
>>> print(sorted(Counter(y_resampled).items()))
136136
[(0, 4674), (1, 4674), (2, 4674)]
137137

@@ -168,12 +168,11 @@ interpolation will create a sample on the line between :math:`x_{i}` and
168168
Each SMOTE variant and ADASYN differ from each other by selecting the samples
169169
:math:`x_i` ahead of generating the new samples.
170170

171-
The **regular** SMOTE algorithm --- cf. to ``kind='regular'`` when
172-
instantiating a :class:`SMOTE` object --- does not impose any rule and will
173-
randomly pick-up all possible :math:`x_i` available.
171+
The **regular** SMOTE algorithm --- cf. to the :class:`SMOTE` object --- does not
172+
impose any rule and will randomly pick-up all possible :math:`x_i` available.
174173

175-
The **borderline** SMOTE --- cf. to ``kind='borderline1'`` and
176-
``kind='borderline2'`` when instantiating a :class:`SMOTE` object --- will
174+
The **borderline** SMOTE --- cf. to the :class:`BorderlineSMOTE` with the
175+
parameters ``kind='borderline-1'`` and ``kind='borderline-2'`` --- will
177176
classify each sample :math:`x_i` to be (i) noise (i.e. all nearest-neighbors
178177
are from a different class than the one of :math:`x_i`), (ii) in danger
179178
(i.e. at least half of the nearest neighbors are from the same class than
@@ -184,10 +183,9 @@ samples *in danger* to generate new samples. In **Borderline-1** SMOTE,
184183
:math:`x_i`. On the contrary, **Borderline-2** SMOTE will consider
185184
:math:`x_{zi}` which can be from any class.
186185

187-
**SVM** SMOTE --- cf. to ``kind='svm'`` when instantiating a :class:`SMOTE`
188-
object --- uses an SVM classifier to find support vectors and generate samples
189-
considering them. Note that the ``C`` parameter of the SVM classifier allows to
190-
select more or less support vectors.
186+
**SVM** SMOTE --- cf. to :class:`SVMSMOTE` --- uses an SVM classifier to find
187+
support vectors and generate samples considering them. Note that the ``C``
188+
parameter of the SVM classifier allows to select more or less support vectors.
191189

192190
For both borderline and SVM SMOTE, a neighborhood is defined using the
193191
parameter ``m_neighbors`` to decide if a sample is in danger, safe, or noise.
@@ -196,7 +194,7 @@ ADASYN is working similarly to the regular SMOTE. However, the number of
196194
samples generated for each :math:`x_i` is proportional to the number of samples
197195
which are not from the same class than :math:`x_i` in a given
198196
neighborhood. Therefore, more samples will be generated in the area that the
199-
nearest neighbor rule is not respected. The parameter ``n_neighbors`` is
197+
nearest neighbor rule is not respected. The parameter ``m_neighbors`` is
200198
equivalent to ``k_neighbors`` in :class:`SMOTE`.
201199

202200
Multi-class management

doc/whats_new/v0.0.4.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,10 @@ Enhancement
3030
- Add support for one-vs-all encoded target to support keras. :issue:`409` by
3131
:user:`Guillaume Lemaitre <glemaitre>`.
3232

33+
- Adding specific class for borderline and SVM SMOTE using
34+
:class:`BorderlineSMOTE` and :class:`SVMSMOTE`.
35+
:issue:`440` by :user:`Guillaume Lemaitre <glemaitre>`.
36+
3337
Bug fixes
3438
.........
3539

@@ -63,3 +67,9 @@ Deprecation
6367
:class:`imblearn.under_sampling.NeighbourhoodCleaningRule`,
6468
:class:`imblearn.under_sampling.InstanceHardnessThreshold`,
6569
:class:`imblearn.under_sampling.CondensedNearestNeighbours`.
70+
71+
- Deprecate ``kind``, ``out_step``, ``svm_estimator``, ``m_neighbors`` in
72+
:class:`imblearn.over_sampling.SMOTE`. User should use
73+
:class:`imblearn.over_sampling.SVMSMOTE` and
74+
:class:`imblearn.over_sampling.BorderlineSMOTE`.
75+
:issue:`440` by :user:`Guillaume Lemaitre <glemaitre>`.

examples/over-sampling/plot_comparison_over_sampling.py

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@
2020
from sklearn.svm import LinearSVC
2121

2222
from imblearn.pipeline import make_pipeline
23-
from imblearn.over_sampling import ADASYN, SMOTE, RandomOverSampler
23+
from imblearn.over_sampling import ADASYN
24+
from imblearn.over_sampling import SMOTE, BorderlineSMOTE, SVMSMOTE
25+
from imblearn.over_sampling import RandomOverSampler
2426
from imblearn.base import SamplerMixin
2527
from imblearn.utils import hash_X_y
2628

@@ -220,21 +222,18 @@ def fit_sample(self, X, y):
220222
class_sep=0.8)
221223

222224
ax_arr = ((ax1, ax2), (ax3, ax4), (ax5, ax6), (ax7, ax8))
223-
string_add = ['regular', 'borderline-1', 'borderline-2', 'SVM']
224-
for str_add, ax, sampler in zip(string_add,
225-
ax_arr,
226-
(SMOTE(random_state=0),
227-
SMOTE(random_state=0, kind='borderline1'),
228-
SMOTE(random_state=0, kind='borderline2'),
229-
SMOTE(random_state=0, kind='svm'))):
225+
for ax, sampler in zip(ax_arr,
226+
(SMOTE(random_state=0),
227+
BorderlineSMOTE(random_state=0, kind='borderline-1'),
228+
BorderlineSMOTE(random_state=0, kind='borderline-2'),
229+
SVMSMOTE(random_state=0))):
230230
clf = make_pipeline(sampler, LinearSVC())
231231
clf.fit(X, y)
232232
plot_decision_function(X, y, clf, ax[0])
233-
ax[0].set_title('Decision function for {} {}'.format(
234-
str_add, sampler.__class__.__name__))
233+
ax[0].set_title('Decision function for {}'.format(
234+
sampler.__class__.__name__))
235235
plot_resampling(X, y, sampler, ax[1])
236-
ax[1].set_title('Resampling using {} {}'.format(
237-
str_add, sampler.__class__.__name__))
236+
ax[1].set_title('Resampling using {}'.format(sampler.__class__.__name__))
238237
fig.tight_layout()
239238

240239
plt.show()

examples/over-sampling/plot_smote.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
from sklearn.decomposition import PCA
1818

1919
from imblearn.over_sampling import SMOTE
20+
from imblearn.over_sampling import BorderlineSMOTE
21+
from imblearn.over_sampling import SVMSMOTE
2022

2123
print(__doc__)
2224

@@ -49,8 +51,8 @@ def plot_resampling(ax, X, y, title):
4951
X_vis = pca.fit_transform(X)
5052

5153
# Apply regular SMOTE
52-
kind = ['regular', 'borderline1', 'borderline2', 'svm']
53-
sm = [SMOTE(kind=k) for k in kind]
54+
sm = [SMOTE(), BorderlineSMOTE(kind='borderline-1'),
55+
BorderlineSMOTE(kind='borderline-2'), SVMSMOTE()]
5456
X_resampled = []
5557
y_resampled = []
5658
X_res_vis = []
@@ -67,9 +69,10 @@ def plot_resampling(ax, X, y, title):
6769
ax_res = [ax3, ax4, ax5, ax6]
6870

6971
c0, c1 = plot_resampling(ax1, X_vis, y, 'Original set')
70-
for i in range(len(kind)):
72+
for i, name in enumerate(['SMOTE', 'SMOTE Borderline-1',
73+
'SMOTE Borderline-2', 'SMOTE SVM']):
7174
plot_resampling(ax_res[i], X_res_vis[i], y_resampled[i],
72-
'SMOTE {}'.format(kind[i]))
75+
'{}'.format(name))
7376

7477
ax2.legend((c0, c1), ('Class #0', 'Class #1'), loc='center',
7578
ncol=1, labelspacing=0.)

imblearn/over_sampling/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,8 @@
66
from .adasyn import ADASYN
77
from .random_over_sampler import RandomOverSampler
88
from .smote import SMOTE
9+
from .smote import BorderlineSMOTE
10+
from .smote import SVMSMOTE
911

10-
__all__ = ['ADASYN', 'RandomOverSampler', 'SMOTE']
12+
__all__ = ['ADASYN', 'RandomOverSampler',
13+
'SMOTE', 'BorderlineSMOTE', 'SVMSMOTE']

0 commit comments

Comments
 (0)