
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
>>> import datetime as dt
>>> import pandas as pd
>>> df = pd.DataFrame([[dt.date(2021,1,1), 1, 'a'], [dt.date(2021,1,2), 2, 'b']])
>>> g = df.groupby([0, 1])
>>> g.groups.key()
... dict_keys([(Timestamp('2021-01-01 00:00:00'), 1), (Timestamp('2021-01-02 00:00:00'), 2)])
Problem description
When multiple columns pass to groupby, the datetime.date
type columns cast o the Timestamp
type.
Due to the unexpected type casting, following code raises a KeyError
.
>>> list(g.get_group(key) for key in g.groups.keys())
... KeyError: (Timestamp('2021-01-01 00:00:00'), 1)
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
This typecast will only be performed if multiple indexes are specified.
>>> g = df.groupby([0])
>>> g.groups.key()
... dict_keys([datetime.date(2021, 1, 1), datetime.date(2021, 1, 2)])
Expected Output
>>> g.groups.key()
... dict_keys([(datetime.date(2021, 1, 1), 1), (datetime.date(2021, 1, 2), 2)])
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.8.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : ja_JP.UTF-8
pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.1.1
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.2
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.52.0