[MRG+1] Pipeline Refactor - Reduce Code Footprint #654

MattEding · 2019-12-02T01:19:00Z

Changes/Fixes

Reduces much code duplication in imblearn.pipeline.Pipeline since much of
it could be inherited from sklearn.pipeline.Pipeline. This reduces need of
adapting to potential future changes in sklearn.pipeline.Pipeline's
implementation, increasing code maintainability, and reduces code footprint.
I also believe that it fixes some misleading docstrings--see below.

Notes

The only difference to sklearn's implementation for each of these
methods are the if hasattr(transform, "fit_resample"): pass and the
docstring starting with "Apply transformers/samplers, and ..." instead
of "Apply transforms, and ...". I don't think this docstring difference
should exist since none of these methods actually do
resampling; rather they explicitly pass over it. Also, not all of these
methods used the new wording. The skipping over
fit_transform is simple to do via inheritance with by redefining
def _iter(with_final=True, filter_passthrough=True, filter_resample=True).
The default value allows resuse of sklearn.Pipeline inherited definitions
while requiring a change to _fit to call it with filter_resample=False once.

Sklearn

@if_delegate_has_method(delegate="_final_estimator")
def METHOD(self, X, **kwds):
    """Apply transforms, and ..."""

    Xt = X
    for _, _, transform in self._iter(with_final=False):
        Xt = transform.transform(Xt)
    return self.steps[-1][-1].METHOD(Xt, **kwds)

Imblearn

@if_delegate_has_method(delegate="_final_estimator")
def METHOD(self, X, **kwds):
    """Apply transformers/samplers, and ..."""

    Xt = X
    for _, _, transform in self._iter(with_final=False):
        if hasattr(transform, "fit_resample"):
            pass
        else:
            Xt = transform.transform(Xt)
    return self.steps[-1][-1].METHOD(Xt, **kwds)

Has "/samplers" in docstring

predict
predict_proba
decision_function
predict_log_proba
transform
score

Omits "/samplers" in docstring

score_samples
inverse_transform

Duplicate definitions

score_samples was redefined twice
_fit_transform_one was identical to sklearn's

codecov · 2019-12-02T01:57:41Z

Codecov Report

Merging #654 into master will decrease coverage by 2.06%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #654      +/-   ##
=========================================
- Coverage   98.46%   96.4%   -2.07%     
=========================================
  Files          82      82              
  Lines        4886    4807      -79     
=========================================
- Hits         4811    4634     -177     
- Misses         75     173      +98

Impacted Files	Coverage Δ
imblearn/pipeline.py	`93.93% <100%> (+1.8%)`	⬆️
imblearn/keras/tests/test_generator.py	`8.62% <0%> (-91.38%)`	⬇️
imblearn/tensorflow/_generator.py	`30% <0%> (-70%)`	⬇️
imblearn/keras/_generator.py	`50.74% <0%> (-47.77%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3839df1...a9d7909. Read the comment docs.

lgtm-com · 2019-12-02T06:19:35Z

This pull request fixes 1 alert when merging d4e7aea into 3839df1 - view on LGTM.com

fixed alerts:

1 for Unused import

chkoar · 2019-12-02T12:15:33Z

imblearn/pipeline.py

@@ -167,6 +167,15 @@ def _validate_steps(self):
                % (estimator, type(estimator))
            )

+    def _iter(


At first I thought that we might need tests for this but since the tests of the pipeline are passing might be just a sanity check. So it's ok.

chkoar · 2019-12-02T12:17:28Z

Tanks @MattEding. At first pass it seems ok. Since we delete so much code and we do not break tests I do not think that is a reason not to merge it.

pep8speaks · 2019-12-05T12:56:15Z

Hello @MattEding! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-05 13:07:54 UTC

glemaitre · 2019-12-05T13:22:37Z

This is nice. I did not note the introduction of _iter in Pipeline from scikit-learn. This is making thing much more maintainable now

lgtm-com · 2019-12-05T13:23:42Z

This pull request fixes 3 alerts when merging a9d7909 into 3839df1 - view on LGTM.com

fixed alerts:

2 for Wrong name for an argument in a call
1 for Unused import

refactored _iter to allow use inheritance to remove derived methods

30dde87

change _iter semantics; remove unused import

d4e7aea

chkoar reviewed Dec 2, 2019

View reviewed changes

chkoar changed the title ~~[MRG] Pipeline Refactor - Reduce Code Footprint~~ [MRG+1] Pipeline Refactor - Reduce Code Footprint Dec 2, 2019

Update pipeline.py

ee7773b

Update pipeline.py

9fa8b57

glemaitre self-assigned this Dec 5, 2019

PEP8

a9d7909

glemaitre merged commit bea1915 into scikit-learn-contrib:master Dec 5, 2019

MattEding deleted the pipeline branch December 5, 2019 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG+1] Pipeline Refactor - Reduce Code Footprint #654

[MRG+1] Pipeline Refactor - Reduce Code Footprint #654

Uh oh!

MattEding commented Dec 2, 2019 •

edited

Loading

Uh oh!

codecov bot commented Dec 2, 2019 •

edited

Loading

Uh oh!

lgtm-com bot commented Dec 2, 2019

Uh oh!

chkoar Dec 2, 2019

Uh oh!

chkoar commented Dec 2, 2019 •

edited

Loading

Uh oh!

pep8speaks commented Dec 5, 2019 •

edited

Loading

Uh oh!

glemaitre commented Dec 5, 2019

Uh oh!

lgtm-com bot commented Dec 5, 2019

Uh oh!

Uh oh!

[MRG+1] Pipeline Refactor - Reduce Code Footprint #654

[MRG+1] Pipeline Refactor - Reduce Code Footprint #654

Uh oh!

Conversation

MattEding commented Dec 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes/Fixes

Notes

Sklearn

Imblearn

Has "/samplers" in docstring

Omits "/samplers" in docstring

Duplicate definitions

Uh oh!

codecov bot commented Dec 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lgtm-com bot commented Dec 2, 2019

Uh oh!

chkoar Dec 2, 2019

Choose a reason for hiding this comment

Uh oh!

chkoar commented Dec 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-12-05 13:07:54 UTC

Uh oh!

glemaitre commented Dec 5, 2019

Uh oh!

lgtm-com bot commented Dec 5, 2019

Uh oh!

Uh oh!

MattEding commented Dec 2, 2019 •

edited

Loading

codecov bot commented Dec 2, 2019 •

edited

Loading

chkoar commented Dec 2, 2019 •

edited

Loading

pep8speaks commented Dec 5, 2019 •

edited

Loading