Description
if you give infer_freq 5 consecutive weekdays, it'll come back with 'D' as its inferred frequency. But if your actual frequency is BDay
, then, when DatetimeIndex checks that the frequency matches, 'B' != 'D'. (note that verify_integrity=False skips this). This leads to a more general issue about infer_freq
with ambiguous cases. I think it makes the most sense to move these sorts of checks to a method on offset that takes a frequence and an Index or ndarray, and determines whether it is compatible.
This matters because you can hit some edge cases when you pass freq and also datetime64[ns] to the DatetimeIndex constructor and more generally because comparing freqstr is probably not the best way to go about checking whether a frequency matches.
Default implementation could be:
def is_compatible(freqstr, arr=None):
return freqstr == self.freqstr
and then bday could do something like (and this is totally psuedocode)
def is_compatible(freqstr, arr=None):
if freqstr == self.freqstr: return True
if arr is not None and len(arr) <= 5:
if freqstr == 'D': # or other compatibles that ensure it's consecutive
return all(is_weekday(date) for date in arr)
This gets more complicated with multiplied offsets, but I think it's worth considering.
from datetime import datetime
import pandas as pd
dates = [datetime(2013, 10, 7), datetime(2013, 10, 8), datetime(2013, 10, 9)]
ind = pd.DatetimeIndex(dates, freq=pd.tseries.frequencies.BDay())
ind2 = pd.DatetimeIndex(ind.values, freq=pd.tseries.frequencies.BDay(),
verify_integrity=False)
ind3 = pd.DatetimeIndex(ind.values, freq=pd.tseries.frequencies.BDay())
produces this Traceback:
Traceback (most recent call last):
File "test2.py", line 8, in <module>
ind3 = pd.DatetimeIndex(ind.values, freq=pd.tseries.frequencies.BDay())
File "../pandas/tseries/index.py", line 280, in __new__
raise ValueError('Dates do not conform to passed '
ValueError: Dates do not conform to passed frequency
cc @cancan101 - this is what we need to deal with in adding your offsets. I believe that every other offset can be returned from infer_freq, so these offsets would be different and therefore could never pass integrity checks. So either we'd need to change infer_freq and/or define some kind of is_compatible method that intelligently covers all the ways in which the frequency could be something different than its freqstr.