Skip to content

qcut: Using cut with IntervalIndex provided by qcut producing wrong NaN values #17284

Closed
@prcastro

Description

@prcastro

xref #17282

Code Sample

>>> x.isnull().sum()
0

>>> x.value_counts()
0.000000     693
12.561725      1
13.568112      1
12.521249      1
13.007628      1
6.993961       1
14.815512      1
6.017280       1
12.944714      1
Name: 0, dtype: int64

>>> categorized = pd.qcut(x, 10, duplicates='drop')
>>> categorized.isnull().sum()
0

>>> categorized.cat.categories  # Notice how all values of x are contained in the only interval
IntervalIndex([(-0.001, 14.816]]
              closed='right',
              dtype='interval[float64]')

>>> res = pd.cut(x, categorized.cat.categories)
>>> res.isnull().sum()
701

Copy pastable

x = pd.read_csv('x.csv', header=None).iloc[:, 0]  # x.csv is provided in a comment below
categorized = pd.qcut(x, 10, duplicates='drop')
res = pd.cut(x, categorized.cat.categories)
res.isnull().sum()

Problem description

When I use qcut to get the IntervalIndex corresponding to the quantiles of a float64 series, and than use this as the bins of cut on the same float64 series, it doesn't work. It produces a new series with a lot of NaN values, while the original series contained no NaN and all of its values are contained at the interval of IntervalIndex.

Expected Output

The result of both qcut and cut should also be the same, but they are not.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.26.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions