Skip to content

BUG: resample with tz-aware: Values falls after last bin #15549

Closed
@ahcub

Description

@ahcub

Code Sample, a copy-pastable example if possible

import pandas as pd

index = pd.DatetimeIndex([1450137600000000000, 1474059600000000000], tz='UTC').tz_convert('America/Chicago')

print(index)

df = pd.DataFrame([1, 2], index=index)

print(df.resample('12h', closed='right', label='right').last().ffill())

Problem description

resampling is not handling non-UTC index properly due to daylight saving time change

and the problem occurs in file https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/resample.py
function: _get_time_bins
code: binner = labels = DatetimeIndex(freq=self.freq,...

this problem can be solved by converting ax tz to UTC before the resampling and applying the original tz after DatetimeIndex is created

so the code will look like this

    tz = ax.tz
    ax = ax.tz_convert('UTC')
    if len(ax) == 0:
        binner = labels = DatetimeIndex(
            data=[], freq=self.freq, name=ax.name)
        return binner, [], labels

    first, last = ax.min(), ax.max()
    first, last = _get_range_edges(first, last, self.freq,
                                   closed=self.closed,
                                   base=self.base)
    # GH #12037
    # use first/last directly instead of call replace() on them
    # because replace() will swallow the nanosecond part
    # thus last bin maybe slightly before the end if the end contains
    # nanosecond part and lead to `Values falls after last bin` error
    binner = labels = DatetimeIndex(freq=self.freq,
                                    start=first,
                                    end=last,
                                    name=ax.name).tz_convert(tz)

this cause the bins to be always aligned by UTC times rather than original tz, but I think that it is adequate behaviour as well.

Expected Output

I expect the resampling to be successful regardless of the time range selected

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 34.0.2 Cython: None numpy: 1.12.0 scipy: 0.18.1 statsmodels: None xarray: None IPython: None sphinx: None patsy: None dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: None matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: 0.7.9.None psycopg2: None jinja2: None boto: None pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions