Skip to content

[MRG] Reorganise under-sampling methods #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 29, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 22 additions & 23 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,34 @@ Under-sampling methods
:no-members:
:no-inherited-members:

Classes
-------
.. currentmodule:: imblearn

Prototype generation
--------------------

.. automodule:: imblearn.under_sampling.prototype_generation
:no-members:
:no-inherited-members:

.. currentmodule:: imblearn

.. autosummary::
:toctree: generated/

under_sampling.ClusterCentroids

Prototype selection
-------------------

.. automodule:: imblearn.under_sampling.prototype_selection
:no-members:
:no-inherited-members:

.. currentmodule:: imblearn

.. autosummary::
:toctree: generated/

under_sampling.CondensedNearestNeighbour
under_sampling.EditedNearestNeighbours
under_sampling.RepeatedEditedNearestNeighbours
Expand All @@ -32,7 +52,6 @@ Classes
under_sampling.RandomUnderSampler
under_sampling.TomekLinks


.. _over_sampling_ref:

Over-sampling methods
Expand All @@ -42,8 +61,6 @@ Over-sampling methods
:no-members:
:no-inherited-members:

Classes
-------
.. currentmodule:: imblearn

.. autosummary::
Expand All @@ -63,8 +80,6 @@ Combination of over- and under-sampling methods
:no-members:
:no-inherited-members:

Classes
-------
.. currentmodule:: imblearn

.. autosummary::
Expand All @@ -83,8 +98,6 @@ Ensemble methods
:no-members:
:no-inherited-members:

Classes
-------
.. currentmodule:: imblearn

.. autosummary::
Expand All @@ -105,18 +118,10 @@ Pipeline

.. currentmodule:: imblearn

Classes
-------
.. autosummary::
:toctree: generated/

pipeline.Pipeline

Functions
---------
.. autosummary::
:toctree: generated/

pipeline.make_pipeline

.. _metrics_ref:
Expand All @@ -130,8 +135,6 @@ Metrics

.. currentmodule:: imblearn

Functions
---------
.. autosummary::
:toctree: generated/

Expand All @@ -152,8 +155,6 @@ Datasets

.. currentmodule:: imblearn

Functions
---------
.. autosummary::
:toctree: generated/

Expand All @@ -169,8 +170,6 @@ Utilities

.. currentmodule:: imblearn

Functions
---------
.. autosummary::
:toctree: generated/

Expand Down
3 changes: 3 additions & 0 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ API changes summary
errors. By `Guillaume Lemaitre`_.
- creation of a module `utils.validation` to make checking of
recurrent patterns. By `Guillaume Lemaitre`_.
- move the under-sampling methods in `prototype_selection` and
`prototype_generation` submodule to make a clearer dinstinction. By
`Guillaume Lemaitre`_.


.. _changes_0_2:
Expand Down
23 changes: 12 additions & 11 deletions imblearn/under_sampling/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,18 @@
a dataset.
"""

from .random_under_sampler import RandomUnderSampler
from .tomek_links import TomekLinks
from .cluster_centroids import ClusterCentroids
from .nearmiss import NearMiss
from .condensed_nearest_neighbour import CondensedNearestNeighbour
from .one_sided_selection import OneSidedSelection
from .neighbourhood_cleaning_rule import NeighbourhoodCleaningRule
from .edited_nearest_neighbours import EditedNearestNeighbours
from .edited_nearest_neighbours import RepeatedEditedNearestNeighbours
from .edited_nearest_neighbours import AllKNN
from .instance_hardness_threshold import InstanceHardnessThreshold
from .prototype_generation import ClusterCentroids

from .prototype_selection import RandomUnderSampler
from .prototype_selection import TomekLinks
from .prototype_selection import NearMiss
from .prototype_selection import CondensedNearestNeighbour
from .prototype_selection import OneSidedSelection
from .prototype_selection import NeighbourhoodCleaningRule
from .prototype_selection import EditedNearestNeighbours
from .prototype_selection import RepeatedEditedNearestNeighbours
from .prototype_selection import AllKNN
from .prototype_selection import InstanceHardnessThreshold

__all__ = [
'RandomUnderSampler', 'TomekLinks', 'ClusterCentroids', 'NearMiss',
Expand Down
10 changes: 10 additions & 0 deletions imblearn/under_sampling/prototype_generation/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""
The :mod:`imblearn.under_sampling.prototype_generation` submodule contains
methods that generate new samples in order to balance the dataset.
"""

from .cluster_centroids import ClusterCentroids

__all__ = [
'ClusterCentroids'
]
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import numpy as np
from sklearn.cluster import KMeans

from ..base import BaseMulticlassSampler
from ...base import BaseMulticlassSampler


class ClusterCentroids(BaseMulticlassSampler):
Expand Down
22 changes: 22 additions & 0 deletions imblearn/under_sampling/prototype_selection/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
"""
The :mod:`imblearn.under_sampling.prototype_selection` submodule contains
methods that select samples in order to balance the dataset.
"""

from .random_under_sampler import RandomUnderSampler
from .tomek_links import TomekLinks
from .nearmiss import NearMiss
from .condensed_nearest_neighbour import CondensedNearestNeighbour
from .one_sided_selection import OneSidedSelection
from .neighbourhood_cleaning_rule import NeighbourhoodCleaningRule
from .edited_nearest_neighbours import EditedNearestNeighbours
from .edited_nearest_neighbours import RepeatedEditedNearestNeighbours
from .edited_nearest_neighbours import AllKNN
from .instance_hardness_threshold import InstanceHardnessThreshold

__all__ = [
'RandomUnderSampler', 'TomekLinks', 'NearMiss',
'CondensedNearestNeighbour', 'OneSidedSelection',
'NeighbourhoodCleaningRule', 'EditedNearestNeighbours',
'RepeatedEditedNearestNeighbours', 'AllKNN', 'InstanceHardnessThreshold'
]
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from sklearn.neighbors import KNeighborsClassifier
from sklearn.utils import check_random_state

from ..base import BaseMulticlassSampler
from ...base import BaseMulticlassSampler


class CondensedNearestNeighbour(BaseMulticlassSampler):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
import numpy as np
from scipy.stats import mode

from ..base import BaseMulticlassSampler
from ..utils import check_neighbors_object
from ...base import BaseMulticlassSampler
from ...utils import check_neighbors_object

SEL_KIND = ('all', 'mode')

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals.six import string_types

from ..base import BaseBinarySampler
from ...base import BaseBinarySampler


def _get_cv_splits(X, y, cv, random_state):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@

import numpy as np

from ..base import BaseMulticlassSampler
from ..utils import check_neighbors_object
from ...base import BaseMulticlassSampler
from ...utils import check_neighbors_object


class NearMiss(BaseMulticlassSampler):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@

import numpy as np

from ..base import BaseMulticlassSampler
from ..utils import check_neighbors_object
from ...base import BaseMulticlassSampler
from ...utils import check_neighbors_object


class NeighbourhoodCleaningRule(BaseMulticlassSampler):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
from sklearn.neighbors import KNeighborsClassifier, NearestNeighbors
from sklearn.utils import check_random_state

from ..base import BaseBinarySampler
from ...base import BaseBinarySampler
from .tomek_links import TomekLinks


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import numpy as np
from sklearn.utils import check_random_state

from ..base import BaseMulticlassSampler
from ...base import BaseMulticlassSampler


class RandomUnderSampler(BaseMulticlassSampler):
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import numpy as np
from sklearn.neighbors import NearestNeighbors

from ..base import BaseBinarySampler
from ...base import BaseBinarySampler


class TomekLinks(BaseBinarySampler):
Expand Down