down_sampling method is choosing duplicate rows of majority class when not needed

#### Duplicated Majority Class rows in RandomUnderSampler.fit_sample
using the under sampling function like the example below: 
```
rus = RandomUnderSampler(ratio = 0.3, random_state=0)
 x_rus, y_rus = rus.fit_sample(x_train, y_train)
``` 
I found the majority class rows were being duplicated though there were plenty of data to choose from.  I have only 10% of minority class in my data and using ratio = 0.3, there's plenty of majority class rows to use so why would the RandomUnderSampler duplicate rows in the majority class? I was only able to find this issue because I attached a row_id to each row before I passed it into down sampling and when I examined my classifier training results, I saw the duplicate rows when sorting the rows by row_id.

#### Steps/Code to Reproduce
```
b = np.array([100, 99, 98,97,96,95,94,93,92,91,100, 99, 98,97,96,95,94,93,92,91 ])
a = np.array([0,1,2,3,4,5,6,7,8,9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
a = a.reshape(20,1)
b = b.reshape(20,1)
y_ds = np.array([1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
y_ds = y_ds.reshape(20,1)
x_ds = np.concatenate((a, b), axis=1)
from imblearn.under_sampling import RandomUnderSampler
rus = RandomUnderSampler(ratio = 0.2, random_state=0)
x_rus, y_rus = rus.fit_sample(x_ds, y_ds,)
print x_rus
print y_rus
```
-->

#### Expected Results
row_id is 1st column so rows 6 & 7 below should not be the same row_id = 5.  using the same random_state=0 every time I can repoduce this error
here is the x_ds array contents being down sampled:
array([[  0, 100],
       [  1,  99],
       [  2,  98],
       [  3,  97],
       [  4,  96],
       [  5,  95],
       [  6,  94],
       [  7,  93],
       [  8,  92],
       [  9,  91],
       [ 10, 100],
       [ 11,  99],
       [ 12,  98],
       [ 13,  97],
       [ 14,  96],
       [ 15,  95],
       [ 16,  94],
       [ 17,  93],
       [ 18,  92],
       [ 19,  91]])

#### Actual Results
[[  0 100]
 [  1  99]
 [ 14  96]
 [ 17  93]
 [  2  98]
 [  5  95]
 [  5  95]
 [  9  91]]
[1 1 0 0 0 0 0 0]

#### Versions
Darwin-15.6.0-x86_64-i386-64bit
('Python', '2.7.12 |Anaconda custom (x86_64)| (default, Jul  2 2016, 17:43:17) \n[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)]')
('NumPy', '1.11.3')
('SciPy', '0.18.1')
('Scikit-Learn', '0.18.1')
('Imbalanced-Learn', '0.2.1')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

down_sampling method is choosing duplicate rows of majority class when not needed #287

Duplicated Majority Class rows in RandomUnderSampler.fit_sample

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

down_sampling method is choosing duplicate rows of majority class when not needed #287

Description

Duplicated Majority Class rows in RandomUnderSampler.fit_sample

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions