Skip to content

Commit 7c48491

Browse files
authored
EHN: mini-batches balancing in keras and tensforflow (#409)
This PR attend to provide some utilities for keras: - [x] support for one-vs-all encoded targets (#410) - [x] balanced batch generator TODO: - [x] Add common test to check multiclass == multilabel-indicator (#410) - [x] Manage the specificity of the EasyEnsemble and BalanceCascade (overwrite `sample`) - [x] Add user guide documentation - [x] Add an example for simple use - [x] Add an example for deep training - [x] Add substitution - [x] What's new - [x] Optional depencies
1 parent eafae67 commit 7c48491

File tree

20 files changed

+1038
-14
lines changed

20 files changed

+1038
-14
lines changed

.travis.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ matrix:
3838
NUMPY_VERSION="1.13.1" SCIPY_VERSION="0.19.1" SKLEARN_VERSION="0.19.0"
3939
- env: DISTRIB="conda" PYTHON_VERSION="3.6"
4040
NUMPY_VERSION="1.13.1" SCIPY_VERSION="0.19.1" SKLEARN_VERSION="0.19.0"
41-
- env: DISTRIB="conda" PYTHON_VERSION="3.6"
42-
NUMPY_VERSION="1.13.1" SCIPY_VERSION="0.19.1" SKLEARN_VERSION="master"
41+
- env: DISTRIB="conda" PYTHON_VERSION="3.7"
42+
NUMPY_VERSION="*" SCIPY_VERSION="*" SKLEARN_VERSION="master"
4343
allow_failures:
44-
- env: DISTRIB="conda" PYTHON_VERSION="3.6"
45-
NUMPY_VERSION="1.13.1" SCIPY_VERSION="0.19.1" SKLEARN_VERSION="master"
44+
- env: DISTRIB="conda" PYTHON_VERSION="3.7"
45+
NUMPY_VERSION="*" SCIPY_VERSION="*" SKLEARN_VERSION="master"
4646

4747
install: source build_tools/travis/install.sh
4848
script: bash build_tools/travis/test_script.sh

appveyor.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,46 @@ environment:
1010
- PYTHON: "C:\\Miniconda-x64"
1111
PYTHON_VERSION: "2.7.x"
1212
PYTHON_ARCH: "64"
13+
OPTIONAL_DEP: "pandas"
1314

1415
- PYTHON: "C:\\Miniconda"
1516
PYTHON_VERSION: "2.7.x"
1617
PYTHON_ARCH: "32"
18+
OPTIONAL_DEP: "pandas"
1719

1820
- PYTHON: "C:\\Miniconda35-x64"
1921
PYTHON_VERSION: "3.5.x"
2022
PYTHON_ARCH: "64"
23+
OPTIONAL_DEP: "pandas keras tensorflow"
2124

2225
- PYTHON: "C:\\Miniconda36-x64"
2326
PYTHON_VERSION: "3.6.x"
2427
PYTHON_ARCH: "64"
28+
OPTIONAL_DEP: "pandas keras tensorflow"
2529

2630
- PYTHON: "C:\\Miniconda36"
2731
PYTHON_VERSION: "3.6.x"
2832
PYTHON_ARCH: "32"
33+
OPTIONAL_DEP: "pandas"
2934

3035
install:
3136
# Prepend miniconda installed Python to the PATH of this build
3237
# Add Library/bin directory to fix issue
3338
# https://github.com/conda/conda/issues/1753
3439
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PYTHON%\\Library\\bin;%PATH%"
35-
- conda install pip scipy numpy scikit-learn=0.19 pandas -y -q
40+
- conda install pip scipy numpy scikit-learn=0.19 -y -q
41+
- "conda install %OPTIONAL_DEP% -y -q"
3642
- conda install pytest pytest-cov -y -q
43+
- pip install codecov
3744
- conda install nose -y -q # FIXME: remove this line when using sklearn > 0.19
3845
- pip install .
3946

4047
test_script:
4148
- mkdir for_test
4249
- cd for_test
4350
- pytest --pyargs imblearn --cov-report term-missing --cov=imblearn
51+
52+
after_test:
53+
- cp .coverage %APPVEYOR_BUILD_FOLDER%
54+
- cd %APPVEYOR_BUILD_FOLDER%
55+
- codecov

build_tools/circle/build_doc.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ conda create -n $CONDA_ENV_NAME --yes --quiet python=3
9292
source activate $CONDA_ENV_NAME
9393

9494
conda install --yes pip numpy scipy scikit-learn pillow matplotlib sphinx \
95-
sphinx_rtd_theme numpydoc
95+
sphinx_rtd_theme numpydoc pandas keras
9696
pip install -U git+https://github.com/sphinx-gallery/sphinx-gallery.git
9797

9898
# Build and install imbalanced-learn in dev mode

build_tools/travis/install.sh

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,15 @@ if [[ "$DISTRIB" == "conda" ]]; then
3838
# provided versions
3939
conda create -n testenv --yes python=$PYTHON_VERSION pip
4040
source activate testenv
41-
conda install --yes numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION pandas
41+
conda install --yes numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION
42+
43+
if [[ $PYTHON_VERSION == "3.6" ]]; then
44+
conda install --yes pandas
45+
conda install --yes -c conda-forge keras
46+
KERAS_BACKEND=tensorflow
47+
python -c "import keras.backend"
48+
sed -i -e 's/"backend":[[:space:]]*"[^"]*/"backend":\ "'$KERAS_BACKEND'/g' ~/.keras/keras.json;
49+
fi
4250

4351
if [[ "$SKLEARN_VERSION" == "master" ]]; then
4452
conda install --yes cython
@@ -59,16 +67,17 @@ elif [[ "$DISTRIB" == "ubuntu" ]]; then
5967
# Create a new virtualenv using system site packages for python, numpy
6068
virtualenv --system-site-packages testvenv
6169
source testvenv/bin/activate
62-
pip install scikit-learn pandas nose nose-timer pytest pytest-cov codecov \
63-
sphinx numpydoc
70+
pip install scikit-learn
71+
pip install pandas keras tensorflow
72+
pip install nose nose-timer pytest pytest-cov codecov sphinx numpydoc
6473

6574
fi
6675

6776
python --version
6877
python -c "import numpy; print('numpy %s' % numpy.__version__)"
6978
python -c "import scipy; print('scipy %s' % scipy.__version__)"
7079

71-
python setup.py develop
80+
pip install -e .
7281
ccache --show-stats
7382
# Useful for debugging how ccache is used
7483
# cat $CCACHE_LOGFILE

conftest.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,27 @@
77

88
# Set numpy array str/repr to legacy behaviour on numpy > 1.13 to make
99
# the doctests pass
10+
import os
11+
import pytest
1012
import numpy as np
13+
1114
try:
1215
np.set_printoptions(legacy='1.13')
1316
except TypeError:
1417
pass
18+
19+
20+
def pytest_runtest_setup(item):
21+
fname = item.fspath.strpath
22+
if (fname.endswith(os.path.join('keras', '_generator.py')) or
23+
fname.endswith('miscellaneous.rst')):
24+
try:
25+
import keras
26+
except ImportError:
27+
pytest.skip('The keras package is not installed.')
28+
elif (fname.endswith(os.path.join('tensorflow', '_generator.py')) or
29+
fname.endswith('miscellaneous.rst')):
30+
try:
31+
import tensorflow
32+
except ImportError:
33+
pytest.skip('The tensorflow package is not installed.')

doc/api.rst

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,46 @@ Prototype selection
111111
ensemble.BalancedBaggingClassifier
112112
ensemble.EasyEnsemble
113113

114+
.. _keras_ref:
115+
116+
:mod:`imblearn.keras`: Batch generator for Keras
117+
================================================
118+
119+
.. automodule:: imblearn.keras
120+
:no-members:
121+
:no-inherited-members:
122+
123+
.. currentmodule:: imblearn
124+
125+
.. autosummary::
126+
:toctree: generated/
127+
:template: class.rst
128+
129+
keras.BalancedBatchGenerator
130+
131+
.. autosummary::
132+
:toctree: generated/
133+
:template: function.rst
134+
135+
keras.balanced_batch_generator
136+
137+
.. _tensorflow_ref:
138+
139+
:mod:`imblearn.tensorflow`: Batch generator for TensorFlow
140+
==========================================================
141+
142+
.. automodule:: imblearn.tensorflow
143+
:no-members:
144+
:no-inherited-members:
145+
146+
.. currentmodule:: imblearn
147+
148+
.. autosummary::
149+
:toctree: generated/
150+
:template: function.rst
151+
152+
tensorflow.balanced_batch_generator
153+
114154
.. _misc_ref:
115155

116156
Miscellaneous

doc/miscellaneous.rst

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,114 @@ We illustrate the use of such sampler to implement an outlier rejection
3838
estimator which can be easily used within a
3939
:class:`imblearn.pipeline.Pipeline`:
4040
:ref:`sphx_glr_auto_examples_plot_outlier_rejections.py`
41+
42+
.. _generators:
43+
44+
Custom generators
45+
-----------------
46+
47+
Imbalanced-learn provides specific generators for TensorFlow and Keras which
48+
will generate balanced mini-batches.
49+
50+
.. _tensorflow_generator:
51+
52+
TensorFlow generator
53+
~~~~~~~~~~~~~~~~~~~~
54+
55+
The :func:`imblearn.tensorflow.balanced_batch_generator` allow to generate
56+
balanced mini-batches using an imbalanced-learn sampler which returns indices::
57+
58+
>>> X = X.astype(np.float32)
59+
>>> from imblearn.under_sampling import RandomUnderSampler
60+
>>> from imblearn.tensorflow import balanced_batch_generator
61+
>>> training_generator, steps_per_epoch = balanced_batch_generator(
62+
... X, y, sample_weight=None, sampler=RandomUnderSampler(),
63+
... batch_size=10, random_state=42)
64+
65+
The ``generator`` and ``steps_per_epoch`` is used during the training of the
66+
Tensorflow model. We will illustrate how to use this generator. First, we can
67+
define a logistic regression model which will be optimized by a gradient
68+
descent::
69+
70+
>>> learning_rate, epochs = 0.01, 10
71+
>>> input_size, output_size = X.shape[1], 3
72+
>>> import tensorflow as tf
73+
>>> def init_weights(shape):
74+
... return tf.Variable(tf.random_normal(shape, stddev=0.01))
75+
>>> def accuracy(y_true, y_pred):
76+
... return np.mean(np.argmax(y_pred, axis=1) == y_true)
77+
>>> # input and output
78+
>>> data = tf.placeholder("float32", shape=[None, input_size])
79+
>>> targets = tf.placeholder("int32", shape=[None])
80+
>>> # build the model and weights
81+
>>> W = init_weights([input_size, output_size])
82+
>>> b = init_weights([output_size])
83+
>>> out_act = tf.nn.sigmoid(tf.matmul(data, W) + b)
84+
>>> # build the loss, predict, and train operator
85+
>>> cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
86+
... logits=out_act, labels=targets)
87+
>>> loss = tf.reduce_sum(cross_entropy)
88+
>>> optimizer = tf.train.GradientDescentOptimizer(learning_rate)
89+
>>> train_op = optimizer.minimize(loss)
90+
>>> predict = tf.nn.softmax(out_act)
91+
>>> # Initialization of all variables in the graph
92+
>>> init = tf.global_variables_initializer()
93+
94+
Once initialized, the model is trained by iterating on balanced mini-batches of
95+
data and minimizing the loss previously defined::
96+
97+
>>> with tf.Session() as sess:
98+
... print('Starting training')
99+
... sess.run(init)
100+
... for e in range(epochs):
101+
... for i in range(steps_per_epoch):
102+
... X_batch, y_batch = next(training_generator)
103+
... sess.run([train_op, loss], feed_dict={data: X_batch, targets: y_batch})
104+
... # For each epoch, run accuracy on train and test
105+
... feed_dict = dict()
106+
... predicts_train = sess.run(predict, feed_dict={data: X})
107+
... print("epoch: {} train accuracy: {:.3f}"
108+
... .format(e, accuracy(y, predicts_train)))
109+
... # doctest: +ELLIPSIS
110+
Starting training
111+
[...
112+
113+
.. _keras_generator:
114+
115+
Keras generator
116+
~~~~~~~~~~~~~~~
117+
118+
Keras provides an higher level API in which a model can be defined and train by
119+
calling ``fit_generator`` method to train the model. To illustrate, we will
120+
define a logistic regression model::
121+
122+
>>> import keras
123+
>>> y = keras.utils.to_categorical(y, 3)
124+
>>> model = keras.Sequential()
125+
>>> model.add(keras.layers.Dense(y.shape[1], input_dim=X.shape[1],
126+
... activation='softmax'))
127+
>>> model.compile(optimizer='sgd', loss='categorical_crossentropy',
128+
... metrics=['accuracy'])
129+
130+
:func:`imblearn.keras.balanced_batch_generator` creates a balanced mini-batches
131+
generator with the associated number of mini-batches which will be generated::
132+
133+
>>> from imblearn.keras import balanced_batch_generator
134+
>>> training_generator, steps_per_epoch = balanced_batch_generator(
135+
... X, y, sampler=RandomUnderSampler(), batch_size=10, random_state=42)
136+
137+
Then, ``fit_generator`` can be called passing the generator and the step::
138+
139+
>>> callback_history = model.fit_generator(generator=training_generator,
140+
... steps_per_epoch=steps_per_epoch,
141+
... epochs=10, verbose=0)
142+
143+
The second possibility is to use
144+
:class:`imblearn.keras.BalancedBatchGenerator`. Only an instance of this class
145+
will be passed to ``fit_generator``::
146+
147+
>>> from imblearn.keras import BalancedBatchGenerator
148+
>>> training_generator = BalancedBatchGenerator(
149+
... X, y, sampler=RandomUnderSampler(), batch_size=10, random_state=42)
150+
>>> callback_history = model.fit_generator(generator=training_generator,
151+
... epochs=10, verbose=0)

doc/whats_new/v0.0.4.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ API
1818
- Enable to use a ``list`` for the cleaning methods to specify the class to
1919
sample. :issue:`411` by :user:`Guillaume Lemaitre <glemaitre>`.
2020

21+
New features
22+
............
23+
24+
- Add a ``keras`` and ``tensorflow`` modules to create balanced mini-batches
25+
generator. :issue:`409` by :user:`Guillaume Lemaitre <glemaitre>`.
26+
2127
Enhancement
2228
...........
2329

0 commit comments

Comments
 (0)