Skip to content

reorder_categories() on categorical index of interval values produces bizarre results #23452

Closed
@tmelconian

Description

@tmelconian

Problem description

Categorical indexes where the individual values are of type interval produce bizarre results when reordered with .reorder_categories. This important because these indexes and columns are produced by .cut, so they occur in analysis even if the user never directly creates them.

Minimal Example

With strings, this works as expected:

d = pd.DataFrame(index=pd.CategoricalIndex(['a','b','c'],ordered=True), data={'x':[1,2,3]})
d.index=d.index.reorder_categories(d.index.categories[::-1])
print(d.sort_index())
   x
c  3
b  2
a  1

With intervals, it does not:

e=pd.DataFrame(index=pd.CategoricalIndex([pd.Interval(0,1), pd.Interval(1,2), pd.Interval(2,3)],ordered=True),data={'x':[1,2,3]})
e.index=e.index.reorder_categories(e.index.categories[::-1])
print(e.sort_index())
     x
NaN  1
NaN  2
NaN  3

Expected Output

        x
(2, 3]  3
(1, 2]  2
(0, 1]  1

More Realistic Example

tips=sns.load_dataset('tips')
tips['total_bin']=pd.cut(tips['total_bill'],10)
tips.total_bin.cat.reorder_categories(tips.total_bin.cat.categories[::-1],inplace=True)
tips.head()
   total_bill   tip     sex smoker  day    time  size total_bin
0       16.99  1.01  Female     No  Sun  Dinner     2       NaN
1       10.34  1.66    Male     No  Sun  Dinner     3       NaN
2       21.01  3.50    Male     No  Sun  Dinner     3       NaN
3       23.68  3.31    Male     No  Sun  Dinner     2       NaN
4       24.59  3.61  Female     No  Sun  Dinner     4       NaN

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions