Skip to content

BUG: Passing ambiguous ndarray[datetime64[ns]] to DatetimeIndex constructor can cause ValueError with wrong offset #5152

Closed
@jtratner

Description

@jtratner

if you give infer_freq 5 consecutive weekdays, it'll come back with 'D' as its inferred frequency. But if your actual frequency is BDay, then, when DatetimeIndex checks that the frequency matches, 'B' != 'D'. (note that verify_integrity=False skips this). This leads to a more general issue about infer_freq with ambiguous cases. I think it makes the most sense to move these sorts of checks to a method on offset that takes a frequence and an Index or ndarray, and determines whether it is compatible.

This matters because you can hit some edge cases when you pass freq and also datetime64[ns] to the DatetimeIndex constructor and more generally because comparing freqstr is probably not the best way to go about checking whether a frequency matches.

Default implementation could be:

def is_compatible(freqstr, arr=None):
    return freqstr == self.freqstr

and then bday could do something like (and this is totally psuedocode)

def is_compatible(freqstr, arr=None):
    if freqstr == self.freqstr: return True
    if arr is not None and len(arr) <= 5:
        if freqstr == 'D': # or other compatibles that ensure it's consecutive
            return all(is_weekday(date) for date in arr)

This gets more complicated with multiplied offsets, but I think it's worth considering.

from datetime import datetime
import pandas as pd
dates = [datetime(2013, 10, 7), datetime(2013, 10, 8), datetime(2013, 10, 9)]
ind = pd.DatetimeIndex(dates, freq=pd.tseries.frequencies.BDay())
ind2 = pd.DatetimeIndex(ind.values, freq=pd.tseries.frequencies.BDay(),
                       verify_integrity=False)

ind3 = pd.DatetimeIndex(ind.values, freq=pd.tseries.frequencies.BDay())

produces this Traceback:

Traceback (most recent call last):
  File "test2.py", line 8, in <module>
    ind3 = pd.DatetimeIndex(ind.values, freq=pd.tseries.frequencies.BDay())
  File "../pandas/tseries/index.py", line 280, in __new__
    raise ValueError('Dates do not conform to passed '
ValueError: Dates do not conform to passed frequency

cc @cancan101 - this is what we need to deal with in adding your offsets. I believe that every other offset can be returned from infer_freq, so these offsets would be different and therefore could never pass integrity checks. So either we'd need to change infer_freq and/or define some kind of is_compatible method that intelligently covers all the ways in which the frequency could be something different than its freqstr.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions