Skip to content

A DataFrame including Timestamp with time zone fails to agg(), and makes errors. #23683

Closed
@propella

Description

@propella

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({
    'tag': [1,1],
    'date': [
        pd.Timestamp('2018-01-01', tz='UTC'),
        pd.Timestamp('2018-01-02', tz='UTC')]
})
df.groupby('tag').agg({'date': lambda e: e.head(1)})

Problem description

The above code makes the following errors.

Traceback (most recent call last):
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2670, in agg_series
    return self._aggregate_series_fast(obj, func)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2689, in _aggregate_series_fast
    dummy)
  File "pandas/_libs/reduction.pyx", line 334, in pandas._libs.reduction.SeriesGrouper.__init__
  File "pandas/_libs/reduction.pyx", line 347, in pandas._libs.reduction.SeriesGrouper._check_dummy
ValueError: Dummy array must be same dtype

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3495, in aggregate
    return self._python_agg_general(func_or_funcs, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1068, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2672, in agg_series
    return self._aggregate_series_pure_python(obj, func)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2706, in _aggregate_series_pure_python
    raise ValueError('Function does not reduce')
ValueError: Function does not reduce

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4656, in aggregate
    return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4087, in aggregate
    result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 490, in _aggregate
    result = _agg(arg, _agg_1dim)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 441, in _agg
    result[fname] = func(fname, agg_how)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 424, in _agg_1dim
    return colg.aggregate(how, _level=(_level or 0) + 1)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3497, in aggregate
    result = self._aggregate_named(func_or_funcs, *args, **kwargs)
  File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3627, in _aggregate_named
    raise Exception('Must produce aggregated value')
Exception: Must produce aggregated value

Actually, it works if I remove tz from the timestamps like this. So I guess it is a bug.

df = pd.DataFrame({
    'tag': [1,1],
    'date': [
        pd.Timestamp('2018-01-01'),
        pd.Timestamp('2018-01-02')]
})
df.groupby('tag').agg({'date': lambda e: e.head(1)})

Expected Output

          date
tag
1 2018-01-01 00:00:00+00:00

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.8.0
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions