Skip to content

BUG: assert_frame_equal() with check_like=True errors with non-comparable types #39168

Closed
@khaeru

Description

@khaeru
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code sample

import pandas as pd
import pandas.testing as pdt

# Note that df.columns contains both str and int
df = pd.DataFrame([[0, 1, 2]], columns=["foo", "bar", 42])

pdt.asset_frame_equal(df, df, check_like=True)

Problem description

This code raises:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-05cc1ba40d40> in <module>
----> 1 pdt.assert_frame_equal(df, df, check_like=True)

    [... skipping hidden 2 frame]

~/.local/lib/python3.8/site-packages/pandas/core/indexes/base.py in sort_values(self, return_indexer, ascending, na_position, key)
   4664         # ignore na_position for MultiIndex
   4665         if not isinstance(self, ABCMultiIndex):
-> 4666             _as = nargsort(
   4667                 items=idx, ascending=ascending, na_position=na_position, key=key
   4668             )

~/.local/lib/python3.8/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position, key, mask)
    365
    366     if is_extension_array_dtype(items):
--> 367         return items.argsort(ascending=ascending, kind=kind, na_position=na_position)
    368     else:
    369         items = np.asanyarray(items)

~/.local/lib/python3.8/site-packages/pandas/core/arrays/base.py in argsort(self, ascending, kind, na_position, *args, **kwargs)
    584
    585         values = self._values_for_argsort()
--> 586         return nargsort(
    587             values,
    588             kind=kind,

~/.local/lib/python3.8/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position, key, mask)
    377         non_nans = non_nans[::-1]
    378         non_nan_idx = non_nan_idx[::-1]
--> 379     indexer = non_nan_idx[non_nans.argsort(kind=kind)]
    380     if not ascending:
    381         indexer = indexer[::-1]

TypeError: '<' not supported between instances of 'int' and 'str'

The cause is PR #37479, which added the following to assert_index_equal():

    # If order doesn't matter then sort the index entries
    if not check_order:
        left = left.sort_values()
        right = right.sort_values()

This is code is triggered by assert_frame_equal(…, check_like=True). .sort_order() does not work when an index contains non-comparable types, like str and int.

Detected via iiasa/ixmp#390.

Expected output

In pandas < 1.2.0, the last line above returned True.

The description of the check_like argument is:

check_like : bool, default False
If True, ignore the order of index & columns.
Note: index labels must match their respective rows
(same as in columns) - same labels must be with the same data.

…i.e. this does not indicate that the columns index may only contain comparable types, so the function should not raise an exception.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-36-generic
Version : #40-Ubuntu SMP Tue Jan 5 21:54:35 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8

pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 50.3.2
Cython : 0.29.21
pytest : 6.1.2
hypothesis : None
sphinx : 3.3.0
blosc : 1.8.1
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.6.1
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    RegressionFunctionality that used to work in a prior pandas versionTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions