Description
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code sample
import pandas as pd
import pandas.testing as pdt
# Note that df.columns contains both str and int
df = pd.DataFrame([[0, 1, 2]], columns=["foo", "bar", 42])
pdt.asset_frame_equal(df, df, check_like=True)
Problem description
This code raises:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-05cc1ba40d40> in <module>
----> 1 pdt.assert_frame_equal(df, df, check_like=True)
[... skipping hidden 2 frame]
~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in sort_values(self, return_indexer, ascending, na_position, key)
4664 # ignore na_position for MultiIndex
4665 if not isinstance(self, ABCMultiIndex):
-> 4666 _as = nargsort(
4667 items=idx, ascending=ascending, na_position=na_position, key=key
4668 )
~/.local/lib/python3.8/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position, key, mask)
365
366 if is_extension_array_dtype(items):
--> 367 return items.argsort(ascending=ascending, kind=kind, na_position=na_position)
368 else:
369 items = np.asanyarray(items)
~/.local/lib/python3.8/site-packages/pandas/core/arrays/base.py in argsort(self, ascending, kind, na_position, *args, **kwargs)
584
585 values = self._values_for_argsort()
--> 586 return nargsort(
587 values,
588 kind=kind,
~/.local/lib/python3.8/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position, key, mask)
377 non_nans = non_nans[::-1]
378 non_nan_idx = non_nan_idx[::-1]
--> 379 indexer = non_nan_idx[non_nans.argsort(kind=kind)]
380 if not ascending:
381 indexer = indexer[::-1]
TypeError: '<' not supported between instances of 'int' and 'str'
The cause is PR #37479, which added the following to assert_index_equal()
:
# If order doesn't matter then sort the index entries
if not check_order:
left = left.sort_values()
right = right.sort_values()
This is code is triggered by assert_frame_equal(…, check_like=True)
. .sort_order()
does not work when an index contains non-comparable types, like str
and int
.
Detected via iiasa/ixmp#390.
Expected output
In pandas < 1.2.0, the last line above returned True
.
The description of the check_like
argument is:
pandas/pandas/_testing/asserters.py
Lines 1127 to 1130 in 25110a9
…i.e. this does not indicate that the columns index may only contain comparable types, so the function should not raise an exception.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-36-generic
Version : #40-Ubuntu SMP Tue Jan 5 21:54:35 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8
pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 50.3.2
Cython : 0.29.21
pytest : 6.1.2
hypothesis : None
sphinx : 3.3.0
blosc : 1.8.1
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.6.1
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2