-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[MRG] ENH: K-Means SMOTE implementation #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
glemaitre
merged 42 commits into
scikit-learn-contrib:master
from
StephanHeijl:kmeans-smote
Jun 12, 2019
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
dba11ac
Initial K-Means SMOTE commit.
StephanHeijl d54ffc2
PEP8, PyFlakes fixes, corrected paper reference.
StephanHeijl 4a9b990
Added examples.
StephanHeijl 5dd0526
Added error when clustering fails to find a cluster with sufficient s…
StephanHeijl 642e62e
Added test for wrong hyperparameters
StephanHeijl fd663f1
Save an indexing operation if cluster_class_mean is insufficient.
StephanHeijl 0ef982b
Simplified vstack function call.
StephanHeijl 4c37593
Resolved stacking error
StephanHeijl efb6a75
Added extra arguments for kmeans sampling, addressed suggestions by F…
StephanHeijl 131e3b3
Resolved errors and warnings
StephanHeijl 7de5951
Resolve PEP8 style issues
StephanHeijl 7029266
Added special k-means cases and tests.
StephanHeijl 7696e44
solve conflicts
glemaitre f99433b
Merge remote-tracking branch 'origin/master' into StephanHeijl-kmeans…
glemaitre d65cea3
Merge remote-tracking branch 'origin/master' into pr/StephanHeijl/435
glemaitre c5ab59c
Removed KMeans specific code
StephanHeijl 25e8ef7
Merge branch 'master' into kmeans-smote
StephanHeijl 950df34
Restored KMeansSMOTE
StephanHeijl 2851b7e
Resolved KMeansSmote errors
StephanHeijl 25cd90b
Resolved python2.7 errors
StephanHeijl 750decc
improved code coverage
StephanHeijl b2b766d
Resolved test error resulting from coverage improvement
StephanHeijl 0358d0f
Made custom kmeans test deterministic
StephanHeijl 01f31d0
Removed superfluous check
StephanHeijl 7aa6f86
Change test to use custom KMeans instance (MiniBatchKmeans was default)
StephanHeijl 1f34912
Resolved PEP8 issues
StephanHeijl 05d3f40
Merge branch 'kmeans-smote' of github.com:StephanHeijl/imbalanced-lea…
StephanHeijl 6129fbf
Fixed using the wrong variable name
StephanHeijl 9537ec9
Fixed error in _make_samples call, resolves mediocre sample selection.
StephanHeijl b6fbca4
Updated KMeansSMOTE tests
StephanHeijl ca9b541
Clarified RuntimeError with solution to problem
StephanHeijl 1b4dfd2
Adjusted documentation according to @chkoar's review.
StephanHeijl 367f3a0
Slightly adjusted test to 'fail' for regular SMOTE.
StephanHeijl 7d79475
Merge branch 'master' into kmeans-smote
StephanHeijl 4a414c3
Fix expected print output
StephanHeijl d9fa137
Added ratio back to pass check_samplers_ratio_fit_resample test
StephanHeijl cf1b1fe
Added KMeansSMOTE to DONT_SUPPORT_RATIO and removed space from print
StephanHeijl 9842573
Merge remote-tracking branch 'origin/master' into kmeans-smote
glemaitre 0c4dd16
cleaning
glemaitre f4ec980
DOC: add an entry in documentation
glemaitre c3a1502
DOC: add entry in API documentation
glemaitre 032842e
DOC: add whats new entry
glemaitre File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the docstring we could add the reference of the paper and after that an interactive example. Check here for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an interactive example that specifically demonstrates the utility of the KMeansSmote class, 3 blobs, with the positive class in the middle and the negative classes on the outside and a single negative sample in the middle blob. The example shows that after resampling no new samples are added in the middle blob. Inspired by the following toy problem:
