[MRG] Refactor ratio to pick up any class #290

glemaitre · 2017-05-08T22:04:39Z

Reference Issue

Fixes #121

What does this implement/fix? Explain your changes.

Refactor the ratio parameter

TODO:

Any other comments?

pep8speaks · 2017-05-08T22:04:52Z

Hello @glemaitre! Thanks for updating the PR.

In the file doc/conf.py, following are the PEP8 issues :

Line 30:1: E722 do not use bare except'
Line 41:1: E402 module level import not at top of file
Line 309:80: E501 line too long (86 > 79 characters)
Line 335:80: E501 line too long (84 > 79 characters)

In the file imblearn/utils/tests/test_estimator_checks.py, following are the PEP8 issues :

Line 142:5: E722 do not use bare except'

Comment last updated on May 30, 2017 at 22:15 Hours UTC

chkoar · 2017-05-09T08:59:12Z

imblearn/utils/validation.py

+            raise ValueError("When 'ratio' is a string, it needs to be one of"
+                             " {}. Got '{}' instead.".format(RATIO_KIND,
+                                                             ratio))
+        if ratio == 'all' or ratio == 'auto':


IMHO we could avoid that if-elses by using a dict

chkoar · 2017-05-14T15:38:50Z

That's big PR @glemaitre!

glemaitre · 2017-05-14T15:45:22Z

@chkoar yes indeed. I am fixing back the tests, right now.
Mainly, I still have to check the case of a function given for the ratio but we should be almost done code wise. I still have to go through all the docstring.

I get why we had trouble to put it out. It required lot of works

glemaitre · 2017-05-19T12:30:35Z

@chkoar @massich I think the PR is ready.

I removed the multiclass and binary mixins since we only have multiclass from now.
I think that any integration should be only be multiclass.

I still have the issue with the dictionary with the cleaning method thought.

glemaitre · 2017-05-19T12:31:33Z

And it should be good to go fast on this one since any further development will benefit from those changes

glemaitre · 2017-05-19T12:41:38Z

Hints: for a faster review process, you should actually focus on the base classes and mixin. All methods were adapted to use those and there is not real changes apart to make some of them multiclass.

chkoar · 2017-05-22T00:19:56Z

imblearn/under_sampling/base.py

+        X, y = check_X_y(X, y)
+        y = check_target_type(y)
+        self.X_hash_, self.y_hash_ = hash_X_y(X, y)
+        self.ratio_ = check_ratio(self.ratio, y, 'cleaning-sampling')


clean-sampling might be more appropriate, no?

chkoar · 2017-05-22T00:33:42Z

That review it will take some time!

BaseUnderSampler, BaseCleaningSampler and BaseOverSampler have the same __init__ and almost same fit. We could use a parent class that will derive from the SamplerMixin lets say BaseSampler . In the fit in the check_ratio function as a third argument we will use a class variable that it will "become" an instance variable. So, in the base class we will have self.ratio_ = check_ratio(self.ratio, y, self.clean_kind) . Thus, in any child classes we only have to define the clean_kind class variable variable.

p.s. I proposed class variables in order to avoid to override the __init__. You can access class level variables from self.

glemaitre · 2017-05-22T04:15:56Z

I agree with those. I would also deprecate the combine class.

* change cleaning-sampler to clean-sampler * Refactor the over_sampling * [WIP] adapt ensamble class

* change cleaning-sampler to clean-sampler * Refactor the over_sampling * [WIP] adapt ensamble class * iterate * fix PEP8

glemaitre · 2017-05-24T20:48:29Z

@massich @chkoar anything else?

chkoar · 2017-05-25T06:04:39Z

@glemaitre I will scan it this weekend!

…s/121

chkoar · 2017-05-29T08:10:47Z

imblearn/base.py

-        if not type_of_target(y) == 'binary':
-            warnings.simplefilter('always', UserWarning)
-            warnings.warn('The target type should be binary.')
+class BaseSampler(SamplerMixin):


Couldn't we merge BaseSampler and SamplerMixin into BaseSampler?

I am not in favor. I would like the sampler to have a SamplerMixin as the ClassifierMixin or TransformerMixin

chkoar · 2017-05-29T08:13:45Z

imblearn/base.py

-        -------
-        self : object,
-            Return self.
+    def _validate_size_ngh_deprecation(self):


I think that we could make all these _vaildate_methods(estimator) as module level functions to keep the class clean. So, it will be clean for any newcomer.

Yep I agree with that

chkoar · 2017-05-29T08:22:15Z

imblearn/utils/validation.py

+    y_hash: str
+        Hash identifier of the ``y`` matrix.
+    """
+    return joblib.hash(X), joblib.hash(y)


Is this fast for big arrays? Does it worth instead of the naive check of the array shape?

it needs to be profiled. But I think that this is too slow. We could randomly pick some samples which should be better than the shape. But I am still unsure about that part.

glemaitre · 2017-05-30T23:05:59Z

@massich @chkoar Does anybody how to optimize this function:
https://github.com/scikit-learn-contrib/imbalanced-learn/pull/290/files#diff-d928243d0cfeca2e8bd7d6084e5018c6R244

without cythonizing at first.

massich · 2017-06-08T18:00:24Z

LGTM

chkoar · 2017-06-08T18:10:09Z

@massich @chkoar Does anybody how to optimize this function:
https://github.com/scikit-learn-contrib/imbalanced-learn/pull/290/files#diff-d928243d0cfeca2e8bd7d6084e5018c6R244
without cythonizing at first.

What was the progress on that?

chkoar · 2017-06-08T18:10:18Z

In general this PR should have been splitted. Since, it hasn't and the tests are passing I propose to merge the changes. @glemaitre you put a lot of effort on this. We could improve or fix any bug from this PR in the feature. Thanks!

* EHN enable multiclass ratio handling * FIX simplify call to dictionary * FIX RUS done * FIX Refactor ADASYN * FIX partial * FIX refactor SMOTE * FIX refactor SMOTE * DOC add proper docstring * PEP8 * FIX ClusterCentroids * FIX refactor IHT * FIX Nearmiss refactoring * FIX tomek links refactor * FIX refactor OSS * FIX NCR refactoring * FIX refactor combined methods with Pipeline * FIX combine method targetting all classes when cleaning * FIX balance cascade refactoring * EHN add the possibility to add a dict for ratio * TST add test for check_ratio * TST add test for float * FIX/TST adapt common test * TST fix IHT tests * TST fix NCR * FIX combine test * TST fix balance * FIX doctest * FIX doctest * FIX solve the pickle issue * FIX remove comments * TST add test for NCR * TST add knn balance cascade * EHN add callable option for the ratio * DOC make doc cleaner * FIX/DOC remove useless comments and clean doc * DEP deprecation of ratio as float * EHN add base class for cleaning methods * TST add common test for multi class * MAINT downgrade sphinx for the moment * TST/EHN add test for the ratio and specific ratio for cleaning sampling * EHN remove redundant code * FIX warning * Remove useless base class * MAINT add christos back to some file * EHN rename test and add a comment * DOC add hash_X_y in the API * [MRG] Incorporate chkoar remarks (#6) * change cleaning-sampler to clean-sampler * Refactor the over_sampling * [WIP] adapt ensamble class * [MRG] Remove the init in base class (#7) * change cleaning-sampler to clean-sampler * Refactor the over_sampling * [WIP] adapt ensamble class * iterate * fix PEP8 * EHN doc * FIX add extension for sphinx * EHN make deprecatin great again * EHN Improve SMOTE and ADASYN

EHN enable multiclass ratio handling

4f87df5

glemaitre added 2 commits May 9, 2017 00:13

FIX simplify call to dictionary

4e528ec

FIX RUS done

4996bbb

chkoar reviewed May 9, 2017

View reviewed changes

glemaitre added 20 commits May 9, 2017 16:58

FIX Refactor ADASYN

3f5dffa

FIX partial

87a630d

FIX refactor SMOTE

9825317

FIX refactor SMOTE

b7021fc

DOC add proper docstring

8a010fb

PEP8

f573af3

FIX ClusterCentroids

a85dcff

FIX refactor IHT

cabf202

FIX Nearmiss refactoring

d2539b1

FIX tomek links refactor

0ee50c1

FIX refactor OSS

118af0e

FIX NCR refactoring

96b102e

FIX refactor combined methods with Pipeline

8ecfd88

FIX combine method targetting all classes when cleaning

d4b9c3e

FIX balance cascade refactoring

f5303ca

EHN add the possibility to add a dict for ratio

0e93429

TST add test for check_ratio

38fe8ca

TST add test for float

7f076cf

FIX/TST adapt common test

d89c12d

TST fix IHT tests

039420b

glemaitre added 3 commits May 14, 2017 18:02

TST fix NCR

a31c0e1

FIX combine test

02be5f5

TST fix balance

6fba010

glemaitre mentioned this pull request May 18, 2017

Cannot suppress warnings #291

Closed

Remove useless base class

15c158c

MAINT add christos back to some file

834de2f

glemaitre added 2 commits May 19, 2017 15:05

EHN rename test and add a comment

db22871

DOC add hash_X_y in the API

38708e6

glemaitre changed the title ~~[MRG] Refactor ratio to pick up any class~~ [WIP] Refactor ratio to pick up any class May 21, 2017

chkoar reviewed May 22, 2017

View reviewed changes

massich added 2 commits May 22, 2017 14:45

[MRG] Incorporate chkoar remarks (#6)

8e59009

* change cleaning-sampler to clean-sampler * Refactor the over_sampling * [WIP] adapt ensamble class

[MRG] Remove the init in base class (#7)

18f726c

* change cleaning-sampler to clean-sampler * Refactor the over_sampling * [WIP] adapt ensamble class * iterate * fix PEP8

glemaitre added 3 commits May 26, 2017 13:25

EHN doc

19f423a

Merge branch 'is/121' of github.com:glemaitre/imbalanced-learn into i…

9f3cfbd

…s/121

FIX add extension for sphinx

e4892ac

chkoar reviewed May 29, 2017

View reviewed changes

glemaitre added 2 commits May 30, 2017 17:56

EHN make deprecatin great again

7aef770

EHN Improve SMOTE and ADASYN

c5ab8b9

glemaitre changed the title ~~[WIP] Refactor ratio to pick up any class~~ [MRG] Refactor ratio to pick up any class May 30, 2017

glemaitre merged commit c2c6565 into scikit-learn-contrib:master Jun 10, 2017

[MRG] Refactor ratio to pick up any class #290

[MRG] Refactor ratio to pick up any class #290

Uh oh!

Conversation

glemaitre commented May 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

pep8speaks commented May 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on May 30, 2017 at 22:15 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chkoar commented May 14, 2017

Uh oh!

glemaitre commented May 14, 2017

Uh oh!

glemaitre commented May 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented May 19, 2017

Uh oh!

glemaitre commented May 19, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chkoar commented May 22, 2017

Uh oh!

glemaitre commented May 22, 2017 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented May 24, 2017

Uh oh!

chkoar commented May 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented May 30, 2017

Uh oh!

massich commented Jun 8, 2017

Uh oh!

chkoar commented Jun 8, 2017

Uh oh!

chkoar commented Jun 8, 2017

Uh oh!

Uh oh!

glemaitre commented May 8, 2017 •

edited

Loading

pep8speaks commented May 8, 2017 •

edited

Loading

glemaitre commented May 19, 2017 •

edited

Loading

glemaitre commented May 22, 2017 via email •

edited

Loading