[MRG] EHN add collections of imbalanced datasets #249

glemaitre · 2017-03-18T00:42:04Z

Reference Issue

Fixes #247

What does this implement/fix? Explain your changes.

Make a fetcher for an imbalanced datasets collections available on Zenodo maintained by ourself.

Any other comments?

TODO:

Write the fetcher;
Unit tests;
Write documentation

pep8speaks · 2017-03-18T00:42:15Z

Hello @glemaitre! Thanks for updating the PR.

In the file imblearn/datasets/zenodo.py, following are the PEP8 issues :

Line 108:18: E128 continuation line under-indented for visual indent
Line 109:18: E128 continuation line under-indented for visual indent
Line 110:18: E128 continuation line under-indented for visual indent
Line 111:18: E128 continuation line under-indented for visual indent

Comment last updated on April 03, 2017 at 13:01 Hours UTC

codecov · 2017-03-18T00:50:07Z

Codecov Report

Merging #249 into master will decrease coverage by 0.09%.
The diff coverage is 95.19%.

@@            Coverage Diff            @@
##           master     #249     +/-   ##
=========================================
- Coverage   98.27%   98.18%   -0.1%     
=========================================
  Files          58       60      +2     
  Lines        3427     3530    +103     
=========================================
+ Hits         3368     3466     +98     
- Misses         59       64      +5

Impacted Files	Coverage Δ
imblearn/datasets/__init__.py	`100% <100%> (ø)`	⬆️
imblearn/datasets/tests/test_zenodo.py	`89.18% <89.18%> (ø)`
imblearn/datasets/zenodo.py	`98.46% <98.46%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ca3037...28e7830. Read the comment docs.

glemaitre · 2017-03-19T22:06:58Z

Still have an issue with the docstring regarding the table. It is not so nice. I have to check what can be done.

glemaitre · 2017-03-25T19:51:25Z

@chkoar This one can be merged also. The PEP8 has been corrected and the table looks nice.

chkoar · 2017-03-26T19:00:22Z

doc/api.rst

@@ -158,7 +158,7 @@ Functions
   :toctree: generated/

   datasets.make_imbalance
-
+   datasets.fetch_zenodo


Why not just fetch_datasets?

I wanted something similar to fetch_mldata

chkoar · 2017-03-26T19:01:41Z

doc/whats_new.rst

@@ -14,6 +14,8 @@ New features

 - Turn off steps in :class:`pipeline.Pipeline` using the `None`
  object. By `Christos Aridas`_.
+- Add a fetching method `datasets.fetch_zenodo` in order to get some


I would say function instead of method

chkoar · 2017-03-26T19:02:57Z

imblearn/datasets/__init__.py


-__all__ = ['make_imbalance']
+__all__ = ['make_imbalance',


Sort them if you like

Uhm not sure since that they are not in the same file.

chkoar · 2017-03-26T19:04:10Z

imblearn/datasets/zenodo.py

+                 download_if_missing=True,
+                 random_state=None,
+                 shuffle=False):
+    """Load the Higgs dataset, downloading it if necessary.


L112 could be removed

chkoar · 2017-03-26T19:06:49Z

imblearn/datasets/zenodo.py

+
+    Returns
+    -------
+    datasets : OrderedDict of Bunch object,


Dictionary of Bunch objects.

I would like to specified that this is an ordered dictionary. I change the code in accordance

chkoar · 2017-03-30T09:24:12Z

imblearn/datasets/zenodo.py

@@ -0,0 +1,279 @@
+"""Collection of imbalanced datasets.
+
+This collection of datasets have been proposed in [1]_. The


I think that is: "This collection of datasets has been proposed..." or "These datasets have been proposed..."

glemaitre · 2017-03-30T18:18:35Z

@chkoar done

chkoar · 2017-03-31T18:01:37Z

imblearn/datasets/zenodo.py

+    ----------
+    data_home : string, optional (default=None)
+        Specify another download and cache folder for the datasets. By default
+        all scikit learn data is stored in '~/scikit_learn_data' subfolders.


scikit-learn :)

chkoar · 2017-04-03T00:00:57Z

My main concern here is the name of the function. I would call it like fetch_datasets , fetch_data or something like that, since we choose to host them over zenodo. Apart from that it could be easily merged! :P

glemaitre · 2017-04-03T13:03:24Z

Ok so I went for fetch_datasets

chkoar · 2017-04-03T13:30:28Z

Will we change the filenames?

glemaitre · 2017-04-06T09:25:06Z

Will we change the filenames?

Not for the moment. It has no influence on the API or the import. So if we come with a better name we can change it afterwards.

Guillaume Lemaitre added 2 commits March 19, 2017 17:50

EHN add collections of imbalanced datasets

f8a2309

TST add unit tests

ac13c51

glemaitre force-pushed the fetcher_dataset branch from 16caacf to ac13c51 Compare March 19, 2017 16:51

Guillaume Lemaitre added 4 commits March 19, 2017 17:59

FIX change type of error raised

0381b7d

FIX python 3 compatibility

b29fc34

TST add test for error

8c27e75

DOC add documentation

ddbed2d

glemaitre changed the title ~~[WIP] EHN add collections of imbalanced datasets~~ [MRG] EHN add collections of imbalanced datasets Mar 19, 2017

DOC/FIX solve the link to the doc

2d5a7d1

glemaitre changed the title ~~[MRG] EHN add collections of imbalanced datasets~~ [WIP] EHN add collections of imbalanced datasets Mar 19, 2017

Guillaume Lemaitre added 5 commits March 25, 2017 19:40

DOC change table style

47f6b9b

DOC add the information about the dataset in the docstring

11ea904

Try rst style in numpy docstring

2fd4295

DOC/FIX make nice rst table

6b95144

PEP8

0bd1522

glemaitre changed the title ~~[WIP] EHN add collections of imbalanced datasets~~ [MRG] EHN add collections of imbalanced datasets Mar 25, 2017

chkoar reviewed Mar 26, 2017

View reviewed changes

Guillaume Lemaitre added 2 commits March 27, 2017 11:12

FIX comments christos

a3f7f8d

FIX doc christos comments

ea96823

chkoar reviewed Mar 30, 2017

View reviewed changes

DOC Fix christos comments

ad23b4d

FIX improve readibility

5f79302

glemaitre force-pushed the fetcher_dataset branch from 79e2216 to 5f79302 Compare March 30, 2017 19:27

chkoar reviewed Mar 31, 2017

View reviewed changes

DOC fix christos comments

82bc13a

ENH addressed chistos comments

28e7830

chkoar merged commit 3c54ea6 into scikit-learn-contrib:master Apr 6, 2017

glemaitre added a commit to glemaitre/imbalanced-learn that referenced this pull request Jun 15, 2017

EHN: Add a collection of imbalanced datasets (scikit-learn-contrib#249)

8f3d74f

glemaitre added a commit to glemaitre/imbalanced-learn that referenced this pull request Jun 15, 2017

EHN: Add a collection of imbalanced datasets (scikit-learn-contrib#249)

9093cd1

		@@ -0,0 +1,279 @@
		"""Collection of imbalanced datasets.

		This collection of datasets have been proposed in [1]_. The


		__all__ = ['make_imbalance']
		__all__ = ['make_imbalance',

[MRG] EHN add collections of imbalanced datasets #249

[MRG] EHN add collections of imbalanced datasets #249

Uh oh!

Conversation

glemaitre commented Mar 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

pep8speaks commented Mar 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on April 03, 2017 at 13:01 Hours UTC

Uh oh!

codecov bot commented Mar 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

glemaitre commented Mar 19, 2017

Uh oh!

glemaitre commented Mar 25, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chkoar Mar 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Mar 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chkoar commented Apr 3, 2017

Uh oh!

glemaitre commented Apr 3, 2017

Uh oh!

chkoar commented Apr 3, 2017

Uh oh!

glemaitre commented Apr 6, 2017

Uh oh!

Uh oh!

glemaitre commented Mar 18, 2017 •

edited

Loading

pep8speaks commented Mar 18, 2017 •

edited

Loading

codecov bot commented Mar 18, 2017 •

edited

Loading

chkoar Mar 26, 2017 •

edited

Loading