Description
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas. (Tested version 1.2.0)
- (optional) I have confirmed this bug exists on the master branch of pandas.
This fails:
pd.DataFrame(columns=pd.CategoricalIndex([]), index=['K']).reindex(columns=pd.CategoricalIndex(['A', 'A']))
But these succeed:
pd.DataFrame(columns=pd.Index([]), index=['K']).reindex(columns=pd.CategoricalIndex(['A', 'A']))
pd.DataFrame(columns=pd.CategoricalIndex([]), index=['K']).reindex(columns=pd.CategoricalIndex(['A', 'B']))
pd.DataFrame(columns=pd.CategoricalIndex([]), index=['K']).reindex(columns=pd.CategoricalIndex([]))
The error is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\util\_decorators.py", line 312, in wrapper
return func(*args, **kwargs)
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\frame.py", line 4173, in reindex
return super().reindex(**kwargs)
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\generic.py", line 4806, in reindex
return self._reindex_axes(
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\frame.py", line 4013, in _reindex_axes
frame = frame._reindex_columns(
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\frame.py", line 4055, in _reindex_columns
new_columns, indexer = self.columns.reindex(
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\indexes\category.py", line 448, in reindex
new_target, indexer, _ = result._reindex_non_unique(np.array(target))
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\indexes\base.py", line 3589, in _reindex_non_unique
new_indexer = np.arange(len(self.take(indexer)))
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\indexes\base.py", line 751, in take
taken = algos.take(
File "C:\Users\mboling\Anaconda3\envs\pandastest\lib\site-packages\pandas\core\algorithms.py", line 1657, in take
result = arr.take(indices, axis=axis)
IndexError: cannot do a non-empty take from an empty axes.
Problem description
It is unexpected that CategoricalIndex
behaves differently than Index
in this regard. A problem similar to this was already reported and solved in #16770, but it looks like there is a remaining bug in the edge case where the target index contains duplicates.
Expected Output
The failing code should return a dataframe with two columns and one row.
Output of pd.show_versions()
blas 1.0 mkl
bottleneck 1.3.2 py39h7cc1a96_1
ca-certificates 2020.12.8 haa95532_0
certifi 2020.12.5 py39haa95532_0
et_xmlfile 1.0.1 py_1001
icc_rt 2019.0.0 h0cc432a_1
intel-openmp 2020.3 h57928b3_311 conda-forge
jdcal 1.4.1 py_0
libblas 3.9.0 5_mkl conda-forge
libcblas 3.9.0 5_mkl conda-forge
liblapack 3.9.0 5_mkl conda-forge
mkl 2020.4 hb70f87d_311 conda-forge
mkl-service 2.3.0 py39h196d8e1_0
numpy 1.19.4 py39h6635163_1 conda-forge
openpyxl 3.0.5 py_0
openssl 1.1.1i h2bbff1b_0
pandas 1.2.0 py39h2e25243_0 conda-forge
pip 20.3.3 pyhd8ed1ab_0 conda-forge
python 3.9.1 h7840368_2_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.9 1_cp39 conda-forge
pytz 2020.5 pyhd8ed1ab_0 conda-forge
scipy 1.5.2 py39h14eb087_0
setuptools 49.6.0 py39h467e6f4_2 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.34.0 h8ffe710_0 conda-forge
tzdata 2020f he74cb21_0 conda-forge
vc 14.2 hb210afc_2 conda-forge
vs2015_runtime 14.28.29325 h5e1d092_0 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
wincertstore 0.2 py39hde42818_1005 conda-forge
xlrd 2.0.1 pyhd3eb1b0_0