Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import numpy as np
import pandas as pd
input_df = pd.DataFrame(**{
'index': [0, 1],
'columns': ['loss', 'category_64973.fc_size', 'category_64973.num_fc_layers', 'training.learning_rate'],
'data': [[1.0549572706222534, 240, 2, 0.0014908184659929895], [1.225046157836914, 160, 2, 0.0013734204727201226]]
})
input_df['training.learning_rate'] = pd.qcut(
input_df['training.learning_rate'],
q=10,
precision=3,
duplicates='drop',
)
data = input_df.pivot_table(
index='category_64973.fc_size',
columns='training.learning_rate',
values='loss',
aggfunc='mean'
)
# Seaborn code starts here
mask = np.zeros(data.shape, bool)
mask = pd.DataFrame(mask,
index=data.index,
columns=data.columns,
dtype=bool)
mask | pd.isnull(data)
Problem description
An error occurs when attempting to plot a quantized pivot table using Seaborn with the latest version of Pandas (1.1.5
).
The code above is a self-contained example showing what Seaborn is doing when heatmap()
is called on the input pivot table (data
). See this usage in the Ludwig framework: https://github.com/uber/ludwig/blob/master/ludwig/utils/visualization_utils.py#L1392. Prior to v1.1.5, this code was working fine and used to generate plots in Ludwig.
The stack trace is as follows:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
701 try:
--> 702 self._validate_key(k, i)
703 except ValueError as err:
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
1368 else:
-> 1369 raise ValueError(f"Can only index by location with a [{self._valid_types}]")
1370
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-1-e654830c5b85> in <module>
32 dtype=bool)
33
---> 34 mask | pd.isnull(data)
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/ops/__init__.py in f(self, other, axis, level, fill_value)
638 self, other, op, axis, default_axis, fill_value, level
639 ):
--> 640 return _frame_arith_method_with_reindex(self, other, op)
641
642 if isinstance(other, ABCSeries) and fill_value is not None:
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/ops/__init__.py in _frame_arith_method_with_reindex(left, right, op)
572 )
573
--> 574 new_left = left.iloc[:, lcols]
575 new_right = right.iloc[:, rcols]
576 result = op(new_left, new_right)
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
871 # AttributeError for IntervalTree get_value
872 pass
--> 873 return self._getitem_tuple(key)
874 else:
875 # we by definition only have the 0th axis
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
1441 def _getitem_tuple(self, tup: Tuple):
1442
-> 1443 self._has_valid_tuple(tup)
1444 try:
1445 return self._getitem_lowerdim(tup)
~/repos/ludwig/env/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
705 "Location based indexing can only have "
706 f"[{self._valid_types}] types"
--> 707 ) from err
708
709 def _is_nested_tuple_indexer(self, tup: Tuple) -> bool:
ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
Note that this last mask | pd.isnull(data)
operations succeeds with Pandas 1.1.4 and all other dependencies being left the same.
Expected Output
The mask | pd.isnull(data)
call should succeed.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : b5958ee
python : 3.7.8.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0
Cython : 0.29.21
pytest : 6.1.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.2
bottleneck : None
fsspec : 0.8.4
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.52.0