Skip to content

crosstab shows margins in wrong order if index/columns are ordered categories #25278

Closed
@oriolmirosa

Description

@oriolmirosa

Code Sample

Do a crosstab with an ordered categorical variable, with the order being different than the alphabetic default:

df = pd.DataFrame({'First': ['B', 'B', 'C', 'A', 'B', 'C'], 
                   'Second': ['C', 'B', 'B', 'B', 'C', 'A']})

df.First = df.First.astype(CategoricalDtype(ordered=True))

# Default order is alphabetic
df.First.cat.categories

>> Index(['A', 'B', 'C'], dtype='object')

# Change the order
df.First = df.First.cat.reorder_categories(['C', 'A', 'B'])
df.First.cat.categories

>> Index(['C', 'A', 'B'], dtype='object')

# Do a simple crosstab with margins
pd.crosstab(df.First, df.Second, margins=True)

>> Second   A   B   C   All
>> First 
>> C        1   1   0   1
>> A        0   1   0   3
>> B        0   1   2   2
>> All      1   3   2   6

Problem description

The margins are showing the wrong values. In fact, they're showing the values that would be expected from the default ordering of the categories. This might be related to #20496, but the output here surely looks like a bug

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-1032-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.1
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.7.3
Cython: 0.29.4
numpy: 1.16.1
scipy: 1.2.0
pyarrow: 0.12.0
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: 0.9.0
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Labels

Needs TestsUnit test(s) needed to prevent regressionsReshapingConcat, Merge/Join, Stack/Unstack, Explodegood first issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions