Skip to content

OverflowError in resample+aggregate for tz-aware index and list-like aggregation #22660

Closed
@frexvahi

Description

@frexvahi

Code Sample, a copy-pastable example if possible

df = pd.DataFrame(np.random.rand(200, 1),
                  index=pd.DatetimeIndex(start='2017-01-01', freq='15min', periods=200, tz='Europe/Berlin'),
                  columns=['t2p'])
df.resample('1d').aggregate(['mean'])

...

OverflowError [full traceback in 'details' below]
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: 't2p'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
1612 try:
-> 1613 return Index.get_loc(self, key, method, tolerance)
1614 except (KeyError, ValueError, TypeError):

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.DatetimeEngine._date_check_type()

KeyError: 't2p'

During handling of the above exception, another exception occurred:

OverflowError Traceback (most recent call last)
in ()
2 index=pd.DatetimeIndex(start='2017-01-01', freq='1h', periods=100, tz='Europe/Berlin'),
3 columns=['t2p'])
----> 4 df.resample('1d').aggregate(['mean'])
5
6

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/resample.py in aggregate(self, arg, *args, **kwargs)
238
239 self._set_binner()
--> 240 result, how = self._aggregate(arg, *args, **kwargs)
241 if result is None:
242 result = self._groupby_and_aggregate(arg,

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
549 return self._aggregate_multiple_funcs(arg,
550 _level=_level,
--> 551 _axis=_axis), None
552 else:
553 result = None

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
594 try:
595 colg = self._gotitem(col, ndim=1,
--> 596 subset=obj.iloc[:, index])
597 results.append(colg.aggregate(arg))
598 keys.append(col)

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/resample.py in _gotitem(self, key, ndim, subset)
298 # try the key selection
299 try:
--> 300 return grouped[key]
301 except KeyError:
302 return grouped

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/base.py in getitem(self, key)
264
265 else:
--> 266 if key not in self.obj:
267 raise KeyError("Column not found: {key}".format(key=key))
268 return self._gotitem(key, ndim=1)

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/generic.py in contains(self, key)
1520 def contains(self, key):
1521 """True if the key is in the info axis"""
-> 1522 return key in self._info_axis
1523
1524 @property

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/datetimelike.py in contains(self, key)
379 def contains(self, key):
380 try:
--> 381 res = self.get_loc(key)
382 return (is_scalar(res) or isinstance(res, slice) or
383 (is_list_like(res) and len(res)))

~/.conda/envs/everything/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
1619
1620 try:
-> 1621 stamp = Timestamp(key, tz=self.tz)
1622 return Index.get_loc(self, stamp, method, tolerance)
1623 except KeyError:

pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.new()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_str_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion._localize_pydatetime()

~/.conda/envs/everything/lib/python3.6/site-packages/pytz/tzinfo.py in localize(self, dt, is_dst)
321 possible_loc_dt = set()
322 for delta in [timedelta(days=-1), timedelta(days=1)]:
--> 323 loc_dt = dt + delta
324 idx = max(0, bisect_right(
325 self._utc_transition_times, loc_dt) - 1)

OverflowError: date value out of range

Problem description

Here are some changes I have tried in order to work out which situations trigger the bug:

  • No error for tz-naive or UTC, error for 'Europe/Berlin' and 'America/New_York'
  • No error for column name 't2', 't2x', 't2q', 'T_2M', error for 't2p', 't2m', 't2m1', 'T2M'
  • The frequency of the DatetimeIndex and the resample period do not seem to matter
  • No error for .resample().mean() etc., the error only happens when using .resample().aggregate()

Expected Output

                                 t2p
                               mean
2017-01-01 00:00:00+01:00  0.397067
2017-01-02 00:00:00+01:00  0.519352
2017-01-03 00:00:00+01:00  0.534746
2017-01-04 00:00:00+01:00  0.587625
2017-01-05 00:00:00+01:00  0.497514

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-33-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: 3.7.4
pip: 10.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: 0.10.8
IPython: 6.5.0
sphinx: 1.7.8
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.4
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions