Skip to content

[MRG+1] Pipeline Refactor - Reduce Code Footprint #654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 5, 2019

Conversation

MattEding
Copy link
Contributor

@MattEding MattEding commented Dec 2, 2019

Changes/Fixes

Reduces much code duplication in imblearn.pipeline.Pipeline since much of
it could be inherited from sklearn.pipeline.Pipeline. This reduces need of
adapting to potential future changes in sklearn.pipeline.Pipeline's
implementation, increasing code maintainability, and reduces code footprint.
I also believe that it fixes some misleading docstrings--see below.

Notes

The only difference to sklearn's implementation for each of these
methods are the if hasattr(transform, "fit_resample"): pass and the
docstring starting with "Apply transformers/samplers, and ..." instead
of "Apply transforms, and ...". I don't think this docstring difference
should exist since none of these methods actually do
resampling; rather they explicitly pass over it. Also, not all of these
methods used the new wording. The skipping over
fit_transform is simple to do via inheritance with by redefining
def _iter(with_final=True, filter_passthrough=True, filter_resample=True).
The default value allows resuse of sklearn.Pipeline inherited definitions
while requiring a change to _fit to call it with filter_resample=False once.

Sklearn

@if_delegate_has_method(delegate="_final_estimator")
def METHOD(self, X, **kwds):
    """Apply transforms, and ..."""

    Xt = X
    for _, _, transform in self._iter(with_final=False):
        Xt = transform.transform(Xt)
    return self.steps[-1][-1].METHOD(Xt, **kwds)

Imblearn

@if_delegate_has_method(delegate="_final_estimator")
def METHOD(self, X, **kwds):
    """Apply transformers/samplers, and ..."""

    Xt = X
    for _, _, transform in self._iter(with_final=False):
        if hasattr(transform, "fit_resample"):
            pass
        else:
            Xt = transform.transform(Xt)
    return self.steps[-1][-1].METHOD(Xt, **kwds)

Has "/samplers" in docstring

  • predict
  • predict_proba
  • decision_function
  • predict_log_proba
  • transform
  • score

Omits "/samplers" in docstring

  • score_samples
  • inverse_transform

Duplicate definitions

  • score_samples was redefined twice
  • _fit_transform_one was identical to sklearn's

@codecov
Copy link

codecov bot commented Dec 2, 2019

Codecov Report

Merging #654 into master will decrease coverage by 2.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #654      +/-   ##
=========================================
- Coverage   98.46%   96.4%   -2.07%     
=========================================
  Files          82      82              
  Lines        4886    4807      -79     
=========================================
- Hits         4811    4634     -177     
- Misses         75     173      +98
Impacted Files Coverage Δ
imblearn/pipeline.py 93.93% <100%> (+1.8%) ⬆️
imblearn/keras/tests/test_generator.py 8.62% <0%> (-91.38%) ⬇️
imblearn/tensorflow/_generator.py 30% <0%> (-70%) ⬇️
imblearn/keras/_generator.py 50.74% <0%> (-47.77%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3839df1...a9d7909. Read the comment docs.

@lgtm-com
Copy link

lgtm-com bot commented Dec 2, 2019

This pull request fixes 1 alert when merging d4e7aea into 3839df1 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

@@ -167,6 +167,15 @@ def _validate_steps(self):
% (estimator, type(estimator))
)

def _iter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought that we might need tests for this but since the tests of the pipeline are passing might be just a sanity check. So it's ok.

@chkoar chkoar changed the title [MRG] Pipeline Refactor - Reduce Code Footprint [MRG+1] Pipeline Refactor - Reduce Code Footprint Dec 2, 2019
@chkoar
Copy link
Member

chkoar commented Dec 2, 2019

Tanks @MattEding. At first pass it seems ok. Since we delete so much code and we do not break tests I do not think that is a reason not to merge it.

@pep8speaks
Copy link

pep8speaks commented Dec 5, 2019

Hello @MattEding! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-05 13:07:54 UTC

@glemaitre glemaitre self-assigned this Dec 5, 2019
@glemaitre
Copy link
Member

This is nice. I did not note the introduction of _iter in Pipeline from scikit-learn. This is making thing much more maintainable now

@glemaitre glemaitre merged commit bea1915 into scikit-learn-contrib:master Dec 5, 2019
@lgtm-com
Copy link

lgtm-com bot commented Dec 5, 2019

This pull request fixes 3 alerts when merging a9d7909 into 3839df1 - view on LGTM.com

fixed alerts:

  • 2 for Wrong name for an argument in a call
  • 1 for Unused import

@MattEding MattEding deleted the pipeline branch December 5, 2019 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants