-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[MRG] EHN add collections of imbalanced datasets #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] EHN add collections of imbalanced datasets #249
Conversation
Hello @glemaitre! Thanks for updating the PR.
Comment last updated on April 03, 2017 at 13:01 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #249 +/- ##
=========================================
- Coverage 98.27% 98.18% -0.1%
=========================================
Files 58 60 +2
Lines 3427 3530 +103
=========================================
+ Hits 3368 3466 +98
- Misses 59 64 +5
Continue to review full report at Codecov.
|
16caacf
to
ac13c51
Compare
Still have an issue with the docstring regarding the table. It is not so nice. I have to check what can be done. |
@chkoar This one can be merged also. The PEP8 has been corrected and the table looks nice. |
doc/api.rst
Outdated
@@ -158,7 +158,7 @@ Functions | |||
:toctree: generated/ | |||
|
|||
datasets.make_imbalance | |||
|
|||
datasets.fetch_zenodo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just fetch_datasets
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted something similar to fetch_mldata
doc/whats_new.rst
Outdated
@@ -14,6 +14,8 @@ New features | |||
|
|||
- Turn off steps in :class:`pipeline.Pipeline` using the `None` | |||
object. By `Christos Aridas`_. | |||
- Add a fetching method `datasets.fetch_zenodo` in order to get some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say function instead of method
imblearn/datasets/__init__.py
Outdated
|
||
__all__ = ['make_imbalance'] | ||
__all__ = ['make_imbalance', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort them if you like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm not sure since that they are not in the same file.
imblearn/datasets/zenodo.py
Outdated
download_if_missing=True, | ||
random_state=None, | ||
shuffle=False): | ||
"""Load the Higgs dataset, downloading it if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L112 could be removed
|
||
Returns | ||
------- | ||
datasets : OrderedDict of Bunch object, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dictionary of Bunch objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to specified that this is an ordered dictionary. I change the code in accordance
imblearn/datasets/zenodo.py
Outdated
@@ -0,0 +1,279 @@ | |||
"""Collection of imbalanced datasets. | |||
|
|||
This collection of datasets have been proposed in [1]_. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is: "This collection of datasets has been proposed..." or "These datasets have been proposed..."
@chkoar done |
79e2216
to
5f79302
Compare
imblearn/datasets/zenodo.py
Outdated
---------- | ||
data_home : string, optional (default=None) | ||
Specify another download and cache folder for the datasets. By default | ||
all scikit learn data is stored in '~/scikit_learn_data' subfolders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scikit-learn :)
My main concern here is the name of the function. I would call it like |
Ok so I went for |
Will we change the filenames? |
Not for the moment. It has no influence on the API or the import. So if we come with a better name we can change it afterwards. |
Reference Issue
Fixes #247
What does this implement/fix? Explain your changes.
Make a fetcher for an imbalanced datasets collections available on Zenodo maintained by ourself.
Any other comments?
TODO: