-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[MRG] EHN handling sparse matrices whenever possible #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
a68e8eb
EHN POC sparse handling for RandomUnderSampler
glemaitre 0062d6d
EHN support sparse ENN
glemaitre 6197d80
iter
glemaitre f669843
EHN sparse indexing IHT
glemaitre 4adc6db
EHN sparse support nearmiss
glemaitre 9c93dab
Merge branch 'master' into is/158
glemaitre bba7835
EHN support sparse matrices for NCR
glemaitre 9cd917b
EHN support sparse Tomek and OSS
glemaitre c3ba307
EHN support sparsity for CNN
glemaitre d195868
EHN support sparse for SMOTE
glemaitre bcf44ab
EHN support sparse adasyn
glemaitre c405aa9
EHN support sparsity for sombine methods
glemaitre 79637d7
EHN support sparsity BC
glemaitre c199af9
DOC update docstring
glemaitre 425928f
DOC fix example topic classification
glemaitre 4ba8c4e
FIX fix test and class clustercentroids
glemaitre 8298fdc
TST add common test
glemaitre e4c6ebb
TST add ensemble
glemaitre 1226a91
TST use allclose
glemaitre 68b16b5
TST install conda with ubuntu container
glemaitre 35c638b
TST increase tolerance
glemaitre 004f920
TST increase tolerance
glemaitre d3ceb5a
TST test all versions NearMiss and SMOTE
glemaitre d9c4e55
TST set the algorithm of KMeans
glemaitre b469747
DOC add entry in user guide
glemaitre c05d0ba
DOC add entry sparse for CC
glemaitre 1625879
DOC whatsnew entry
glemaitre 709dec3
DOC fix api
glemaitre 14b686f
Merge branch 'master' into is/158
glemaitre 3e0cdc9
TST adapt pytest
glemaitre 0595dab
DOC update user guide
glemaitre a540dc1
Merge remote-tracking branch 'origin/master' into is/158
glemaitre 2d0e730
address comments
glemaitre f8ebd0e
TST remove the last assert_regex
glemaitre File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
.. _introduction: | ||
|
||
============ | ||
Introduction | ||
============ | ||
|
||
.. _api_imblearn: | ||
|
||
API's of imbalanced-learn samplers | ||
---------------------------------- | ||
|
||
The available samplers follows the scikit-learn API using the base estimator and adding a sampling functionality throw the ``sample`` method:: | ||
|
||
:Estimator: | ||
|
||
The base object, implements a ``fit`` method to learn from data, either:: | ||
|
||
estimator = obj.fit(data, targets) | ||
|
||
:Sampler: | ||
|
||
To resample a data sets, each sampler implements:: | ||
|
||
data_resampled, targets_resampled = obj.sample(data, targets) | ||
|
||
Fitting and sampling can also be done in one step:: | ||
|
||
data_resampled, targets_resampled = obj.fit_sample(data, targets) | ||
|
||
Imbalanced-learn samplers accept the same inputs that in scikit-learn: | ||
|
||
* ``data``: array-like (2-D list, pandas.Dataframe, numpy.array) or sparse | ||
matrices; | ||
* ``targets``: array-like (1-D list, pandas.Series, numpy.array). | ||
|
||
.. topic:: Sparse input | ||
|
||
For sparse input the data is **converted to the Compressed Sparse Rows | ||
representation** (see ``scipy.sparse.csr_matrix``) before being fed to the | ||
sampler. To avoid unnecessary memory copies, it is recommended to choose the | ||
CSR representation upstream. | ||
|
||
.. _problem_statement: | ||
|
||
Problem statement regarding imbalanced data sets | ||
------------------------------------------------ | ||
|
||
The learning phase and the subsequent prediction of machine learning algorithms | ||
can be affected by the problem of imbalanced data set. The balancing issue | ||
corresponds to the difference of the number of samples in the different | ||
classes. We illustrate the effect of training a linear SVM classifier with | ||
different level of class balancing. | ||
|
||
.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png | ||
:target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html | ||
:scale: 60 | ||
:align: center | ||
|
||
As expected, the decision function of the linear SVM is highly impacted. With a | ||
greater imbalanced ratio, the decision function favor the class with the larger | ||
number of samples, usually referred as the majority class. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this back