Description
Code Sample
Do a crosstab
with an ordered categorical variable, with the order being different than the alphabetic default:
df = pd.DataFrame({'First': ['B', 'B', 'C', 'A', 'B', 'C'],
'Second': ['C', 'B', 'B', 'B', 'C', 'A']})
df.First = df.First.astype(CategoricalDtype(ordered=True))
# Default order is alphabetic
df.First.cat.categories
>> Index(['A', 'B', 'C'], dtype='object')
# Change the order
df.First = df.First.cat.reorder_categories(['C', 'A', 'B'])
df.First.cat.categories
>> Index(['C', 'A', 'B'], dtype='object')
# Do a simple crosstab with margins
pd.crosstab(df.First, df.Second, margins=True)
>> Second A B C All
>> First
>> C 1 1 0 1
>> A 0 1 0 3
>> B 0 1 2 2
>> All 1 3 2 6
Problem description
The margins are showing the wrong values. In fact, they're showing the values that would be expected from the default ordering of the categories. This might be related to #20496, but the output here surely looks like a bug
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-1032-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.7.3
Cython: 0.29.4
numpy: 1.16.1
scipy: 1.2.0
pyarrow: 0.12.0
xarray: None
IPython: 7.2.0
sphinx: 1.8.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.2.1
pandas_gbq: 0.9.0
pandas_datareader: None
gcsfs: None