scikit-learn-contrib
diff --git a/‎.gitignore
Lines changed: 5 additions & 1 deletion b/‎.gitignore
Lines changed: 5 additions & 1 deletion
diff --git a/‎README.rst
Lines changed: 15 additions & 15 deletions b/‎README.rst
Lines changed: 15 additions & 15 deletions
diff --git a/‎doc/over_sampling.rst
Lines changed: 12 additions & 14 deletions b/‎doc/over_sampling.rst
Lines changed: 12 additions & 14 deletions
diff --git a/‎doc/whats_new/v0.0.4.rst
Lines changed: 10 additions & 0 deletions b/‎doc/whats_new/v0.0.4.rst
Lines changed: 10 additions & 0 deletions
diff --git a/‎examples/over-sampling/plot_comparison_over_sampling.py
Lines changed: 11 additions & 12 deletions b/‎examples/over-sampling/plot_comparison_over_sampling.py
Lines changed: 11 additions & 12 deletions
diff --git a/‎examples/over-sampling/plot_smote.py
Lines changed: 7 additions & 4 deletions b/‎examples/over-sampling/plot_smote.py
Lines changed: 7 additions & 4 deletions
diff --git a/‎imblearn/combine/smote_enn.py
Lines changed: 1 addition & 1 deletion b/‎imblearn/combine/smote_enn.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎imblearn/combine/smote_tomek.py
Lines changed: 1 addition & 1 deletion b/‎imblearn/combine/smote_tomek.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎imblearn/ensemble/balance_cascade.py
Lines changed: 1 addition & 1 deletion b/‎imblearn/ensemble/balance_cascade.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎imblearn/ensemble/easy_ensemble.py
Lines changed: 1 addition & 1 deletion b/‎imblearn/ensemble/easy_ensemble.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎imblearn/over_sampling/__init__.py
Lines changed: 4 additions & 1 deletion b/‎imblearn/over_sampling/__init__.py
Lines changed: 4 additions & 1 deletion
diff --git a/‎imblearn/over_sampling/adasyn.py
Lines changed: 1 addition & 1 deletion b/‎imblearn/over_sampling/adasyn.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎imblearn/over_sampling/random_over_sampler.py
Lines changed: 1 addition & 1 deletion b/‎imblearn/over_sampling/random_over_sampler.py
Lines changed: 1 addition & 1 deletion
@@ -42,6 +42,7 @@ htmlcov/
 nosetests.xml
 coverage.xml
 *,cover
+.pytest_cache/
 
 # Translations
 *.mo
@@ -66,4 +67,7 @@ target/
 *.sln
 *.pyproj
 *.suo
-*.vs
+*.vs
+
+# PyCharm
+.idea/
@@ -166,32 +166,32 @@ The different algorithms are presented in the sphinx-gallery_.
 References:
 -----------
 
-.. [1] : I. Tomek, “Two modifications of CNN,” In Systems, Man, and Cybernetics, IEEE Transactions on, vol. 6, pp 769-772, 2010.
+.. [1] : I. Tomek, “Two modifications of CNN,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, pp. 769-772, 1976. [`bib <references.bib#L148>`_]
 
-.. [2] : I. Mani, I. Zhang. “kNN approach to unbalanced data distributions: a case study involving information extraction,” In Proceedings of workshop on learning from imbalanced datasets, 2003.
+.. [2] : I. Mani, J. Zhang. “kNN approach to unbalanced data distributions: A case study involving information extraction,” In Proceedings of the Workshop on Learning from Imbalanced Data Sets, pp. 1-7, 2003. [`pdf <https://www.site.uottawa.ca/~nat/Workshop2003/jzhang.pdf>`_] [`bib <references.bib#L113>`_]
 
-.. [3] : P. Hart, “The condensed nearest neighbor rule,” In Information Theory, IEEE Transactions on, vol. 14(3), pp. 515-516, 1968.
+.. [3] : P. E. Hart, “The condensed nearest neighbor rule,” IEEE Transactions on Information Theory, vol. 14(3), pp. 515-516, 1968. [`pdf <http://sci2s.ugr.es/keel/pdf/algorithm/articulo/hart1968.pdf>`_] [`bib <references.bib#L51>`_]
 
-.. [4] : M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: one-sided selection,” In ICML, vol. 97, pp. 179-186, 1997.
+.. [4] : M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: One-sided selection,” In Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 179-186, 1997. [`pdf <http://sci2s.ugr.es/keel/pdf/algorithm/congreso/kubat97addressing.pdf>`_] [`bib <references.bib#L76>`_]
 
-.. [5] : J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Springer Berlin Heidelberg, 2001.
+.. [5] : J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe, pp. 63-66, 2001. [`pdf <https://pdfs.semanticscholar.org/0e75/4db8253e84cde4ade4b6f5ba768a6150569a.pdf>`_] [`bib <references.bib#L89>`_]
 
-.. [6] : D. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” In IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2 (3), pp. 408-421, 1972.
+.. [6] : D. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2(3), pp. 408-421, 1972. [`pdf <http://sci2s.ugr.es/keel/pdf/algorithm/articulo/1972-Wilson-IEEETSMC.pdf>`_] [`bib <references.bib#L168>`_]
 
-.. [7] : D. Smith, Michael R., Tony Martinez, and Christophe Giraud-Carrier. “An instance level analysis of data complexity.” Machine learning 95.2 (2014): 225-256.
+.. [7] : M. R. Smith, T. Martinez, C. Giraud-Carrier, “An instance level analysis of data complexity,” Machine learning, vol. 95(2), pp. 225-256, 2014. [`pdf <https://pdfs.semanticscholar.org/5796/8c07abe6a734977db47b08cf4c567733aede.pdf>`_] [`bib <references.bib#L136>`_]
 
-.. [8] : N. V. Chawla, K. W. Bowyer, L. O.Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, 321-357, 2002.
+.. [8] : N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002. [`pdf <http://www.jair.org/media/953/live-953-2037-jair.pdf>`_] [`bib <references.bib#L28>`_]
 
-.. [9] : H. Han, W. Wen-Yuan, M. Bing-Huan, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,” Advances in intelligent computing, 878-887, 2005.
+.. [9] : H. Han, W.-Y. Wang, B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” In Proceedings of the 1st International Conference on Intelligent Computing, pp. 878-887, 2005. [`pdf <http://sci2s.ugr.es/keel/pdf/specific/congreso/han_borderline_smote.pdf>`_] [`bib <references.bib#L38>`_]
 
-.. [10] : H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline over-sampling for imbalanced data classification,” International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), pp.4-21, 2001.
+.. [10] : H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline over-sampling for imbalanced data classification,” In Proceedings of the 5th International Workshop on computational Intelligence and Applications, pp. 24-29, 2009. [`pdf <http://ousar.lib.okayama-u.ac.jp/files/public/1/19617/20160528004522391723/IWCIA2009_A1005.pdf>`_] [`bib <references.bib#L126>`_]
 
-.. [11] : G. Batista, R. C. Prati, M. C. Monard. “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter 6 (1), 20-29, 2004.
+.. [11] : G. E. A. P. A. Batista, R. C. Prati, M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6(1), pp. 20-29, 2004. [`pdf <http://sci2s.ugr.es/keel/dataset/includes/catImbFiles/2004-Batista-SIGKDD.pdf>`_] [`bib <references.bib#L15>`_]
 
-.. [12] : G. Batista, B. Bazzan, M. Monard, [“Balancing Training Data for Automated Annotation of Keywords: a Case Study,” In WOB, 10-18, 2003.
+.. [12] : G. E. A. P. A. Batista, A. L. C. Bazzan, M. C. Monard, “Balancing training data for automated annotation of keywords: A case study,” In Proceedings of the 2nd Brazilian Workshop on Bioinformatics, pp. 10-18, 2003. [`pdf <http://www.inf.ufrgs.br/maslab/pergamus/pubs/balancing-training-data-for.pdf>`_] [`bib <references.bib#L2>`_]
 
-.. [13] : X. Y. Liu, J. Wu and Z. H. Zhou, “Exploratory Undersampling for Class-Imbalance Learning,” in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539-550, April 2009.
+.. [13] : X.-Y. Liu, J. Wu and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 39(2), pp. 539-550, 2009. [`pdf <https://pdfs.semanticscholar.org/beac/3afc6a2cbdefe8dae03de25a139193ef6021.pdf>`_] [`bib <references.bib#L102>`_]
 
-.. [14] : I. Tomek, “An Experiment with the Edited Nearest-Neighbor Rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, June 1976.
+.. [14] : I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, 1976. [`bib <references.bib#L158>`_]
 
-.. [15] : He, Haibo, Yang Bai, Edwardo A. Garcia, and Shutao Li. “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322-1328, 2008.
+.. [15] : H. He, Y. Bai, E. A. Garcia, S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In Proceedings of the 5th IEEE International Joint Conference on Neural Networks, pp. 1322-1328, 2008. [`pdf <https://pdfs.semanticscholar.org/4823/4756b7cf798bfeb47328f7c5d597fd4838c2.pdf>`_] [`bib <references.bib#L62>`_] 
@@ -127,11 +127,11 @@ nearest neighbors class. Those variants are presented in the figure below.
    :align: center
 
 
-The parameter ``kind`` is controlling this feature and the following types are
-available: (i) ``'borderline1'``, (ii) ``'borderline2'``, and (iii) ``'svm'``::
+The :class:`BorderlineSMOTE` and :class:`SVMSMOTE` offer some variant of the SMOTE
+algorithm::
 
-  >>> from imblearn.over_sampling import SMOTE, ADASYN
-  >>> X_resampled, y_resampled = SMOTE(kind='borderline1').fit_sample(X, y)
+  >>> from imblearn.over_sampling import BorderlineSMOTE
+  >>> X_resampled, y_resampled = BorderlineSMOTE().fit_sample(X, y)
   >>> print(sorted(Counter(y_resampled).items()))
   [(0, 4674), (1, 4674), (2, 4674)]
 
@@ -168,12 +168,11 @@ interpolation will create a sample on the line between :math:`x_{i}` and
 Each SMOTE variant and ADASYN differ from each other by selecting the samples
 :math:`x_i` ahead of generating the new samples.
 
-The **regular** SMOTE algorithm --- cf. to ``kind='regular'`` when
-instantiating a :class:`SMOTE` object --- does not impose any rule and will
-randomly pick-up all possible :math:`x_i` available.
+The **regular** SMOTE algorithm --- cf. to the :class:`SMOTE` object --- does not
+impose any rule and will randomly pick-up all possible :math:`x_i` available.
 
-The **borderline** SMOTE --- cf. to ``kind='borderline1'`` and
-``kind='borderline2'`` when instantiating a :class:`SMOTE` object --- will
+The **borderline** SMOTE --- cf. to the :class:`BorderlineSMOTE` with the
+parameters ``kind='borderline-1'`` and ``kind='borderline-2'`` --- will
 classify each sample :math:`x_i` to be (i) noise (i.e. all nearest-neighbors
 are from a different class than the one of :math:`x_i`), (ii) in danger
 (i.e. at least half of the nearest neighbors are from the same class than
@@ -184,10 +183,9 @@ samples *in danger* to generate new samples. In **Borderline-1** SMOTE,
 :math:`x_i`. On the contrary, **Borderline-2** SMOTE will consider
 :math:`x_{zi}` which can be from any class.
 
-**SVM** SMOTE --- cf. to ``kind='svm'`` when instantiating a :class:`SMOTE`
-object --- uses an SVM classifier to find support vectors and generate samples
-considering them. Note that the ``C`` parameter of the SVM classifier allows to
-select more or less support vectors.
+**SVM** SMOTE --- cf. to :class:`SVMSMOTE` --- uses an SVM classifier to find
+support vectors and generate samples considering them. Note that the ``C``
+parameter of the SVM classifier allows to select more or less support vectors.
 
 For both borderline and SVM SMOTE, a neighborhood is defined using the
 parameter ``m_neighbors`` to decide if a sample is in danger, safe, or noise.
@@ -196,7 +194,7 @@ ADASYN is working similarly to the regular SMOTE. However, the number of
 samples generated for each :math:`x_i` is proportional to the number of samples
 which are not from the same class than :math:`x_i` in a given
 neighborhood. Therefore, more samples will be generated in the area that the
-nearest neighbor rule is not respected. The parameter ``n_neighbors`` is
+nearest neighbor rule is not respected. The parameter ``m_neighbors`` is
 equivalent to ``k_neighbors`` in :class:`SMOTE`.
 
 Multi-class management
 
@@ -36,6 +36,10 @@ Enhancement
 - Add support for one-vs-all encoded target to support keras. :issue:`409` by
   :user:`Guillaume Lemaitre <glemaitre>`.
 
+- Adding specific class for borderline and SVM SMOTE using
+  :class:`BorderlineSMOTE` and :class:`SVMSMOTE`.
+  :issue:`440` by :user:`Guillaume Lemaitre <glemaitre>`.
+
 Bug fixes
 .........
 
@@ -69,3 +73,9 @@ Deprecation
   :class:`imblearn.under_sampling.NeighbourhoodCleaningRule`,
   :class:`imblearn.under_sampling.InstanceHardnessThreshold`,
   :class:`imblearn.under_sampling.CondensedNearestNeighbours`.
+
+- Deprecate ``kind``, ``out_step``, ``svm_estimator``, ``m_neighbors`` in
+  :class:`imblearn.over_sampling.SMOTE`. User should use
+  :class:`imblearn.over_sampling.SVMSMOTE` and
+  :class:`imblearn.over_sampling.BorderlineSMOTE`.
+  :issue:`440` by :user:`Guillaume Lemaitre <glemaitre>`.
@@ -20,7 +20,9 @@
 from sklearn.svm import LinearSVC
 
 from imblearn.pipeline import make_pipeline
-from imblearn.over_sampling import ADASYN, SMOTE, RandomOverSampler
+from imblearn.over_sampling import ADASYN
+from imblearn.over_sampling import SMOTE, BorderlineSMOTE, SVMSMOTE
+from imblearn.over_sampling import RandomOverSampler
 from imblearn.base import SamplerMixin
 from imblearn.utils import hash_X_y
 
@@ -220,21 +222,18 @@ def fit_sample(self, X, y):
                       class_sep=0.8)
 
 ax_arr = ((ax1, ax2), (ax3, ax4), (ax5, ax6), (ax7, ax8))
-string_add = ['regular', 'borderline-1', 'borderline-2', 'SVM']
-for str_add, ax, sampler in zip(string_add,
-                                ax_arr,
-                                (SMOTE(random_state=0),
-                                 SMOTE(random_state=0, kind='borderline1'),
-                                 SMOTE(random_state=0, kind='borderline2'),
-                                 SMOTE(random_state=0, kind='svm'))):
+for ax, sampler in zip(ax_arr,
+                       (SMOTE(random_state=0),
+                        BorderlineSMOTE(random_state=0, kind='borderline-1'),
+                        BorderlineSMOTE(random_state=0, kind='borderline-2'),
+                        SVMSMOTE(random_state=0))):
     clf = make_pipeline(sampler, LinearSVC())
     clf.fit(X, y)
     plot_decision_function(X, y, clf, ax[0])
-    ax[0].set_title('Decision function for {} {}'.format(
-        str_add, sampler.__class__.__name__))
+    ax[0].set_title('Decision function for {}'.format(
+        sampler.__class__.__name__))
     plot_resampling(X, y, sampler, ax[1])
-    ax[1].set_title('Resampling using {} {}'.format(
-        str_add, sampler.__class__.__name__))
+    ax[1].set_title('Resampling using {}'.format(sampler.__class__.__name__))
 fig.tight_layout()
 
 plt.show()
@@ -17,6 +17,8 @@
 from sklearn.decomposition import PCA
 
 from imblearn.over_sampling import SMOTE
+from imblearn.over_sampling import BorderlineSMOTE
+from imblearn.over_sampling import SVMSMOTE
 
 print(__doc__)
 
@@ -49,8 +51,8 @@ def plot_resampling(ax, X, y, title):
 X_vis = pca.fit_transform(X)
 
 # Apply regular SMOTE
-kind = ['regular', 'borderline1', 'borderline2', 'svm']
-sm = [SMOTE(kind=k) for k in kind]
+sm = [SMOTE(), BorderlineSMOTE(kind='borderline-1'),
+      BorderlineSMOTE(kind='borderline-2'), SVMSMOTE()]
 X_resampled = []
 y_resampled = []
 X_res_vis = []
@@ -67,9 +69,10 @@ def plot_resampling(ax, X, y, title):
 ax_res = [ax3, ax4, ax5, ax6]
 
 c0, c1 = plot_resampling(ax1, X_vis, y, 'Original set')
-for i in range(len(kind)):
+for i, name in enumerate(['SMOTE', 'SMOTE Borderline-1',
+                          'SMOTE Borderline-2', 'SMOTE SVM']):
     plot_resampling(ax_res[i], X_res_vis[i], y_resampled[i],
-                    'SMOTE {}'.format(kind[i]))
+                    '{}'.format(name))
 
 ax2.legend((c0, c1), ('Class #0', 'Class #1'), loc='center',
            ncol=1, labelspacing=0.)
 
@@ -50,7 +50,7 @@ class SMOTEENN(SamplerMixin):
     -----
     The method is presented in [1]_.
 
-    Supports mutli-class resampling. Refer to SMOTE and ENN regarding the
+    Supports multi-class resampling. Refer to SMOTE and ENN regarding the
     scheme which used.
 
     See :ref:`sphx_glr_auto_examples_combine_plot_smote_enn.py` and
 
@@ -57,7 +57,7 @@ class SMOTETomek(SamplerMixin):
     -----
     The methos is presented in [1]_.
 
-    Supports mutli-class resampling. Refer to SMOTE and TomekLinks regarding
+    Supports multi-class resampling. Refer to SMOTE and TomekLinks regarding
     the scheme which used.
 
     See :ref:`sphx_glr_auto_examples_combine_plot_smote_tomek.py` and
 
@@ -63,7 +63,7 @@ class BalanceCascade(BaseEnsembleSampler):
     -----
     The method is described in [1]_.
 
-    Supports mutli-class resampling. A one-vs.-rest scheme is used as
+    Supports multi-class resampling. A one-vs.-rest scheme is used as
     originally proposed in [1]_.
 
     See :ref:`sphx_glr_auto_examples_ensemble_plot_balance_cascade.py`.
 
@@ -53,7 +53,7 @@ class EasyEnsemble(BaseEnsembleSampler):
     -----
     The method is described in [1]_.
 
-    Supports mutli-class resampling by sampling each class independently.
+    Supports multi-class resampling by sampling each class independently.
 
     See :ref:`sphx_glr_auto_examples_ensemble_plot_easy_ensemble.py`.
 
 
@@ -6,5 +6,8 @@
 from .adasyn import ADASYN
 from .random_over_sampler import RandomOverSampler
 from .smote import SMOTE
+from .smote import BorderlineSMOTE
+from .smote import SVMSMOTE
 
-__all__ = ['ADASYN', 'RandomOverSampler', 'SMOTE']
+__all__ = ['ADASYN', 'RandomOverSampler',
+           'SMOTE', 'BorderlineSMOTE', 'SVMSMOTE']
@@ -50,7 +50,7 @@ class ADASYN(BaseOverSampler):
     -----
     The implementation is based on [1]_.
 
-    Supports mutli-class resampling. A one-vs.-rest scheme is used.
+    Supports multi-class resampling. A one-vs.-rest scheme is used.
 
     See
     :ref:`sphx_glr_auto_examples_applications_plot_over_sampling_benchmark_lfw.py`,
 
@@ -39,7 +39,7 @@ class RandomOverSampler(BaseOverSampler):
 
     Notes
     -----
-    Supports mutli-class resampling by sampling each class independently.
+    Supports multi-class resampling by sampling each class independently.
 
     See
     :ref:`sphx_glr_auto_examples_over-sampling_plot_comparison_over_sampling.py`,