Skip to content

BUG: Unexpected typecast on datetime.date in groupby.key() #38878

Closed
@ghost

Description

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import datetime as dt
>>> import pandas as pd
>>> df = pd.DataFrame([[dt.date(2021,1,1), 1, 'a'], [dt.date(2021,1,2),  2, 'b']])
>>> g = df.groupby([0, 1])
>>> g.groups.key()
... dict_keys([(Timestamp('2021-01-01 00:00:00'), 1), (Timestamp('2021-01-02 00:00:00'), 2)])

Problem description

When multiple columns pass to groupby, the datetime.date type columns cast o the Timestamp type.

Due to the unexpected type casting, following code raises a KeyError.

>>> list(g.get_group(key) for key in g.groups.keys())
... KeyError: (Timestamp('2021-01-01 00:00:00'), 1)

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

This typecast will only be performed if multiple indexes are specified.

>>> g = df.groupby([0])
>>> g.groups.key()
... dict_keys([datetime.date(2021, 1, 1), datetime.date(2021, 1, 2)])

Expected Output

>>> g.groups.key()
... dict_keys([(datetime.date(2021, 1, 1), 1), (datetime.date(2021, 1, 2), 2)])

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.8.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : ja_JP.UTF-8
pandas : 1.2.0
numpy : 1.19.4
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.1.1
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.2
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.52.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions