Skip to content

BUG: Pandas 1.1.5 location-based indexing error with quantized pivot table #38367

Closed
@tgaddair

Description

@tgaddair
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import numpy as np
import pandas as pd

input_df = pd.DataFrame(**{
    'index': [0, 1], 
    'columns': ['loss', 'category_64973.fc_size', 'category_64973.num_fc_layers', 'training.learning_rate'], 
    'data': [[1.0549572706222534, 240, 2, 0.0014908184659929895], [1.225046157836914, 160, 2, 0.0013734204727201226]]
})

input_df['training.learning_rate'] = pd.qcut(
    input_df['training.learning_rate'],
    q=10,
    precision=3,
    duplicates='drop',
)

data = input_df.pivot_table(
    index='category_64973.fc_size',
    columns='training.learning_rate',
    values='loss',
    aggfunc='mean'
)

# Seaborn code starts here
mask = np.zeros(data.shape, bool)
mask = pd.DataFrame(mask,
                    index=data.index,
                    columns=data.columns,
                    dtype=bool)

mask | pd.isnull(data)

Problem description

An error occurs when attempting to plot a quantized pivot table using Seaborn with the latest version of Pandas (1.1.5).

The code above is a self-contained example showing what Seaborn is doing when heatmap() is called on the input pivot table (data). See this usage in the Ludwig framework: https://github.com/uber/ludwig/blob/master/ludwig/utils/visualization_utils.py#L1392. Prior to v1.1.5, this code was working fine and used to generate plots in Ludwig.

The stack trace is as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    701             try:
--> 702                 self._validate_key(k, i)
    703             except ValueError as err:

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1368         else:
-> 1369             raise ValueError(f"Can only index by location with a [{self._valid_types}]")
   1370 

ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-1-e654830c5b85> in <module>
     32                     dtype=bool)
     33 
---> 34 mask | pd.isnull(data)

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/ops/__init__.py in f(self, other, axis, level, fill_value)
    638             self, other, op, axis, default_axis, fill_value, level
    639         ):
--> 640             return _frame_arith_method_with_reindex(self, other, op)
    641 
    642         if isinstance(other, ABCSeries) and fill_value is not None:

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _frame_arith_method_with_reindex(left, right, op)
    572     )
    573 
--> 574     new_left = left.iloc[:, lcols]
    575     new_right = right.iloc[:, rcols]
    576     result = op(new_left, new_right)

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    871                     # AttributeError for IntervalTree get_value
    872                     pass
--> 873             return self._getitem_tuple(key)
    874         else:
    875             # we by definition only have the 0th axis

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1441     def _getitem_tuple(self, tup: Tuple):
   1442 
-> 1443         self._has_valid_tuple(tup)
   1444         try:
   1445             return self._getitem_lowerdim(tup)

~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    705                     "Location based indexing can only have "
    706                     f"[{self._valid_types}] types"
--> 707                 ) from err
    708 
    709     def _is_nested_tuple_indexer(self, tup: Tuple) -> bool:

ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

Note that this last mask | pd.isnull(data) operations succeeds with Pandas 1.1.4 and all other dependencies being left the same.

Expected Output

The mask | pd.isnull(data) call should succeed.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : b5958ee
python : 3.7.8.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : 0.29.21
pytest : 6.1.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.2
bottleneck : None
fsspec : 0.8.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.52.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions