Description
Code Sample
import pandas as pd
idx1 = pd.Categorical(['A1', 'A2', 'A2'], categories=['A1', 'A2'])
idx2 = pd.Categorical(['A2', 'A1', 'A1'], categories=['A1', 'A2'])
tmp1 = pd.DataFrame(np.random.randn(6).reshape(3, 2), index=idx1)
tmp2 = pd.DataFrame(np.random.randn(6).reshape(3, 2), index=idx2)
tmp1 * tmp2
The result:
In [1]: tmp1
Out[1]:
0 1
A1 -0.880693 0.794644
A2 -0.671629 1.145027
A2 0.305152 0.379116
In [2]: tmp2
Out[2]:
0 1
A2 0.434122 -0.966894
A1 -0.162195 0.499271
A1 0.840569 0.043832
In [3]: tmp1 * tmp2
Out[3]:
0 1
0 0.142844 0.396742
0 -0.740283 0.034831
1 -0.291569 -1.107120
1 0.132473 -0.366565
In [4]: pd.__version__
Out[4]: '0.24.2'
Problem description
When getting multiplication of two DataFrames with CategoricalIndex based on the same categories, the result will convert the CategoricalIndex to the Int one. The problem may be from dataframe alignment.
In [7]: tmp1_align, tmp2_align =tmp1.align(tmp2)
In [8]: tmp1_align
Out[8]:
0 1
0 -0.880693 0.794644
0 -0.880693 0.794644
1 -0.671629 1.145027
1 0.305152 0.379116
I think the simple binary operations should preserve the index type for categorical index with the same categories. However, it doesn't. Then I find sometimes the operation preserve the categorical index type but convert to object plain-text index which cost large memory. So I am confused about it and haven't found any explaination yet.
Expected Output
The result with categorical index like this:
0 1
A1 0.142844 0.396742
A1 -0.740283 0.034831
A2 -0.291569 -1.107120
A2 0.132473 -0.366565
Output of pd.show_versions()
pandas: 0.24.2
pytest: 5.0.1
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.12
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.6.1
sphinx: 2.1.2
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.2
numexpr: 2.6.9
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.4
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.5
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None