Skip to content

Quantile function fails when performing groupby on Time Zone Aware Timestamps #33168

Closed
@jlandercy

Description

@jlandercy

Code Sample, a copy-pastable example if possible

Maybe not a high priority bug, but I have the feeling it can easily fixed. I just have not enough understanding on how it should be fixed. Please find below the MCVE to reproduce it:

import numpy as np
import pandas as pd

# Sample Dataset:
n = 200
c = np.random.choice([0,1,2], size=(n,))
d = np.random.randn(n)
t = pd.date_range(start='2020-04-19 00:00:00', freq='1T', periods=n, tz='UTC')
df = pd.DataFrame([r for r in zip(c, t, d)], columns=['category', 'timestamp', 'value'])
df['rtime'] = df['timestamp'].dt.floor('1H')

# Failing operation:
df.groupby('rtime').quantile([0.1, 0.5, 0.9])

Problem description

The traceback of the error is a bit laconic and I have not enough experience in Pandas source code to cover all details of this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-aff2c9a206a6> in <module>
----> 1 df.groupby('rtime').quantile([0.1,0.2])

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in quantile(self, q, interpolation)
   1926                     interpolation=interpolation,
   1927                 )
-> 1928                 for qi in q
   1929             ]
   1930             result = concat(results, axis=0, keys=q)

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in <listcomp>(.0)
   1926                     interpolation=interpolation,
   1927                 )
-> 1928                 for qi in q
   1929             ]
   1930             result = concat(results, axis=0, keys=q)

/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/groupby.py in _get_cythonized_result(self, how, cython_dtype, aggregate, needs_values, needs_mask, needs_ngroups, result_is_index, pre_processing, post_processing, **kwargs)
   2289                 func = partial(func, ngroups)
   2290 
-> 2291             func(**kwargs)  # Call func to modify indexer values in place
   2292 
   2293             if result_is_index:

pandas/_libs/groupby.pyx in pandas._libs.groupby.__pyx_fused_cpdef()

TypeError: No matching signature found

I have found similar issues on GitHub with the same exception, but I think it is too generic to be the same related problem. Additionally, I may have found a simple corner case issue with TZ aware timestamp.

I had some hard time to reproduce the error when building the MCVE, finally I found out that it is related to the existence of an extra columns holding Time Zone aware timestamps.

Maybe the fix it is just about updating function signature to add TZ aware timestamps.

The problem can be circonvolved using one of the following writing:

df.groupby('rtime')['value'].quantile([0.1,0.2])

Or:

df['timestamp'] = df['timestamp'].dt.tz_convert(None)
df.groupby('rtime').quantile([0.1,0.2])

Or:

df.pop('timestamp')
df.groupby('rtime').quantile([0.1,0.2])

Which strongly suggests it is the existence of the TZ Aware extra column timestamp that makes the function quantile fail.

Expected Output

Expected output might be no distinction in flow when performing groupby operations on dataframe holding TimeZone aware timestamp as it does with TZ naive timestamp.

Note: Thank you for building such a great tool, pandas is a first class middleware. Your efforts are strongly appreciated. Let me know how I can help, I would be happy to understand how this can be corrected.

Output of pd.show_versions()

commit : None python : 3.6.9.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-91-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 9.0.1
setuptools : 46.1.3
Cython : 0.29.14
pytest : 5.3.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 0.999999999
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : 0.8.3
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
numba : None

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions