Skip to content

Commit a1cf3f6

Browse files
jbrockmendeljreback
authored andcommitted
BUG: Assorted DatetimeIndex bugfixes (#24157)
1 parent 5cedb39 commit a1cf3f6

File tree

10 files changed

+109
-29
lines changed

10 files changed

+109
-29
lines changed

doc/source/whatsnew/v0.24.0.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -380,6 +380,7 @@ Backwards incompatible API changes
380380
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
381381
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
382382
- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`)
383+
- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`)
383384

384385
Percentage change on groupby changes
385386
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1133,7 +1134,6 @@ Deprecations
11331134
- In :meth:`Series.where` with Categorical data, providing an ``other`` that is not present in the categories is deprecated. Convert the categorical to a different dtype or add the ``other`` to the categories first (:issue:`24077`).
11341135
- :meth:`Series.clip_lower`, :meth:`Series.clip_upper`, :meth:`DataFrame.clip_lower` and :meth:`DataFrame.clip_upper` are deprecated and will be removed in a future version. Use ``Series.clip(lower=threshold)``, ``Series.clip(upper=threshold)`` and the equivalent ``DataFrame`` methods (:issue:`24203`)
11351136

1136-
11371137
.. _whatsnew_0240.deprecations.datetimelike_int_ops:
11381138

11391139
Integer Addition/Subtraction with Datetime-like Classes Is Deprecated
@@ -1310,6 +1310,9 @@ Datetimelike
13101310
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`)
13111311
- Bug in :class:`Categorical.__setitem__` not allowing setting with another ``Categorical`` when both are undordered and have the same categories, but in a different order (:issue:`24142`)
13121312
- Bug in :func:`date_range` where using dates with millisecond resolution or higher could return incorrect values or the wrong number of values in the index (:issue:`24110`)
1313+
- Bug in :class:`DatetimeIndex` where constructing a :class:`DatetimeIndex` from a :class:`Categorical` or :class:`CategoricalIndex` would incorrectly drop timezone information (:issue:`18664`)
1314+
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would incorrectly lose the index's ``freq`` attribute (:issue:`21282`)
1315+
- Clarified error message produced when passing an incorrect ``freq`` argument to :class:`DatetimeIndex` with ``NaT`` as the first entry in the passed data (:issue:`11587`)
13131316

13141317
Timedelta
13151318
^^^^^^^^^
@@ -1422,6 +1425,7 @@ Indexing
14221425
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the ``Index`` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
14231426
- Bug in :class:`Index` slicing with boolean :class:`Index` may raise ``TypeError`` (:issue:`22533`)
14241427
- Bug in ``PeriodArray.__setitem__`` when accepting slice and list-like value (:issue:`23978`)
1428+
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would lose their ``freq`` attribute (:issue:`21282`)
14251429

14261430
Missing
14271431
^^^^^^^

pandas/core/arrays/datetimelike.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,10 @@ def __getitem__(self, key):
354354
freq = key.step * self.freq
355355
else:
356356
freq = self.freq
357+
elif key is Ellipsis:
358+
# GH#21282 indexing with Ellipsis is similar to a full slice,
359+
# should preserve `freq` attribute
360+
freq = self.freq
357361

358362
attribs['freq'] = freq
359363

@@ -550,9 +554,22 @@ def _validate_frequency(cls, index, freq, **kwargs):
550554
if index.size == 0 or inferred == freq.freqstr:
551555
return None
552556

553-
on_freq = cls._generate_range(start=index[0], end=None,
554-
periods=len(index), freq=freq, **kwargs)
555-
if not np.array_equal(index.asi8, on_freq.asi8):
557+
try:
558+
on_freq = cls._generate_range(start=index[0], end=None,
559+
periods=len(index), freq=freq,
560+
**kwargs)
561+
if not np.array_equal(index.asi8, on_freq.asi8):
562+
raise ValueError
563+
except ValueError as e:
564+
if "non-fixed" in str(e):
565+
# non-fixed frequencies are not meaningful for timedelta64;
566+
# we retain that error message
567+
raise e
568+
# GH#11587 the main way this is reached is if the `np.array_equal`
569+
# check above is False. This can also be reached if index[0]
570+
# is `NaT`, in which case the call to `cls._generate_range` will
571+
# raise a ValueError, which we re-raise with a more targeted
572+
# message.
556573
raise ValueError('Inferred frequency {infer} from passed values '
557574
'does not conform to passed frequency {passed}'
558575
.format(infer=inferred, passed=freq.freqstr))

pandas/core/arrays/datetimes.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@
1414
from pandas.util._decorators import Appender
1515

1616
from pandas.core.dtypes.common import (
17-
_INT64_DTYPE, _NS_DTYPE, is_datetime64_dtype, is_datetime64tz_dtype,
18-
is_extension_type, is_float_dtype, is_int64_dtype, is_object_dtype,
19-
is_period_dtype, is_string_dtype, is_timedelta64_dtype)
17+
_INT64_DTYPE, _NS_DTYPE, is_categorical_dtype, is_datetime64_dtype,
18+
is_datetime64tz_dtype, is_extension_type, is_float_dtype, is_int64_dtype,
19+
is_object_dtype, is_period_dtype, is_string_dtype, is_timedelta64_dtype)
2020
from pandas.core.dtypes.dtypes import DatetimeTZDtype
2121
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
2222
from pandas.core.dtypes.missing import isna
@@ -277,6 +277,8 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
277277
if closed is not None:
278278
raise ValueError("Closed has to be None if not both of start"
279279
"and end are defined")
280+
if start is NaT or end is NaT:
281+
raise ValueError("Neither `start` nor `end` can be NaT")
280282

281283
left_closed, right_closed = dtl.validate_endpoints(closed)
282284

@@ -1666,6 +1668,13 @@ def maybe_convert_dtype(data, copy):
16661668
raise TypeError("Passing PeriodDtype data is invalid. "
16671669
"Use `data.to_timestamp()` instead")
16681670

1671+
elif is_categorical_dtype(data):
1672+
# GH#18664 preserve tz in going DTI->Categorical->DTI
1673+
# TODO: cases where we need to do another pass through this func,
1674+
# e.g. the categories are timedelta64s
1675+
data = data.categories.take(data.codes, fill_value=NaT)
1676+
copy = False
1677+
16691678
elif is_extension_type(data) and not is_datetime64tz_dtype(data):
16701679
# Includes categorical
16711680
# TODO: We have no tests for these

pandas/tests/indexes/datetimes/test_construction.py

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,42 @@
1414
from pandas import (
1515
DatetimeIndex, Index, Timestamp, date_range, datetime, offsets,
1616
to_datetime)
17-
from pandas.core.arrays import period_array
17+
from pandas.core.arrays import (
18+
DatetimeArrayMixin as DatetimeArray, period_array)
1819
import pandas.util.testing as tm
1920

2021

2122
class TestDatetimeIndex(object):
2223

24+
@pytest.mark.parametrize('dt_cls', [DatetimeIndex, DatetimeArray])
25+
def test_freq_validation_with_nat(self, dt_cls):
26+
# GH#11587 make sure we get a useful error message when generate_range
27+
# raises
28+
msg = ("Inferred frequency None from passed values does not conform "
29+
"to passed frequency D")
30+
with pytest.raises(ValueError, match=msg):
31+
dt_cls([pd.NaT, pd.Timestamp('2011-01-01')], freq='D')
32+
with pytest.raises(ValueError, match=msg):
33+
dt_cls([pd.NaT, pd.Timestamp('2011-01-01').value],
34+
freq='D')
35+
36+
def test_categorical_preserves_tz(self):
37+
# GH#18664 retain tz when going DTI-->Categorical-->DTI
38+
# TODO: parametrize over DatetimeIndex/DatetimeArray
39+
# once CategoricalIndex(DTA) works
40+
41+
dti = pd.DatetimeIndex(
42+
[pd.NaT, '2015-01-01', '1999-04-06 15:14:13', '2015-01-01'],
43+
tz='US/Eastern')
44+
45+
ci = pd.CategoricalIndex(dti)
46+
carr = pd.Categorical(dti)
47+
cser = pd.Series(ci)
48+
49+
for obj in [ci, carr, cser]:
50+
result = pd.DatetimeIndex(obj)
51+
tm.assert_index_equal(result, dti)
52+
2353
def test_dti_with_period_data_raises(self):
2454
# GH#23675
2555
data = pd.PeriodIndex(['2016Q1', '2016Q2'], freq='Q')

pandas/tests/indexes/datetimes/test_date_range.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,14 @@ def test_date_range_timestamp_equiv_preserve_frequency(self):
8080

8181

8282
class TestDateRanges(TestData):
83+
def test_date_range_nat(self):
84+
# GH#11587
85+
msg = "Neither `start` nor `end` can be NaT"
86+
with pytest.raises(ValueError, match=msg):
87+
date_range(start='2016-01-01', end=pd.NaT, freq='D')
88+
with pytest.raises(ValueError, match=msg):
89+
date_range(start=pd.NaT, end='2016-01-01', freq='D')
90+
8391
def test_date_range_out_of_bounds(self):
8492
# GH#14187
8593
with pytest.raises(OutOfBoundsDatetime):
@@ -533,12 +541,12 @@ class TestGenRangeGeneration(object):
533541

534542
def test_generate(self):
535543
rng1 = list(generate_range(START, END, offset=BDay()))
536-
rng2 = list(generate_range(START, END, time_rule='B'))
544+
rng2 = list(generate_range(START, END, offset='B'))
537545
assert rng1 == rng2
538546

539547
def test_generate_cday(self):
540548
rng1 = list(generate_range(START, END, offset=CDay()))
541-
rng2 = list(generate_range(START, END, time_rule='C'))
549+
rng2 = list(generate_range(START, END, offset='C'))
542550
assert rng1 == rng2
543551

544552
def test_1(self):

pandas/tests/indexes/datetimes/test_indexing.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,15 @@
1616

1717

1818
class TestGetItem(object):
19+
def test_ellipsis(self):
20+
# GH#21282
21+
idx = pd.date_range('2011-01-01', '2011-01-31', freq='D',
22+
tz='Asia/Tokyo', name='idx')
23+
24+
result = idx[...]
25+
assert result.equals(idx)
26+
assert result is not idx
27+
1928
def test_getitem(self):
2029
idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx')
2130
idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D',

pandas/tests/indexes/period/test_indexing.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,14 @@
1313

1414

1515
class TestGetItem(object):
16+
def test_ellipsis(self):
17+
# GH#21282
18+
idx = period_range('2011-01-01', '2011-01-31', freq='D',
19+
name='idx')
20+
21+
result = idx[...]
22+
assert result.equals(idx)
23+
assert result is not idx
1624

1725
def test_getitem(self):
1826
idx1 = pd.period_range('2011-01-01', '2011-01-31', freq='D',

pandas/tests/indexes/timedeltas/test_indexing.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@
99

1010

1111
class TestGetItem(object):
12+
def test_ellipsis(self):
13+
# GH#21282
14+
idx = timedelta_range('1 day', '31 day', freq='D', name='idx')
15+
16+
result = idx[...]
17+
assert result.equals(idx)
18+
assert result is not idx
19+
1220
def test_getitem(self):
1321
idx1 = timedelta_range('1 day', '31 day', freq='D', name='idx')
1422

pandas/tests/tseries/offsets/test_offsets.py

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -53,17 +53,11 @@ def test_to_m8():
5353
valb = datetime(2007, 10, 1)
5454
valu = _to_m8(valb)
5555
assert isinstance(valu, np.datetime64)
56-
# assert valu == np.datetime64(datetime(2007,10,1))
5756

58-
# def test_datetime64_box():
59-
# valu = np.datetime64(datetime(2007,10,1))
60-
# valb = _dt_box(valu)
61-
# assert type(valb) == datetime
62-
# assert valb == datetime(2007,10,1)
6357

64-
#####
65-
# DateOffset Tests
66-
#####
58+
#####
59+
# DateOffset Tests
60+
#####
6761

6862

6963
class Base(object):

pandas/tseries/offsets.py

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2457,8 +2457,7 @@ class Nano(Tick):
24572457
# ---------------------------------------------------------------------
24582458

24592459

2460-
def generate_range(start=None, end=None, periods=None,
2461-
offset=BDay(), time_rule=None):
2460+
def generate_range(start=None, end=None, periods=None, offset=BDay()):
24622461
"""
24632462
Generates a sequence of dates corresponding to the specified time
24642463
offset. Similar to dateutil.rrule except uses pandas DateOffset
@@ -2470,26 +2469,20 @@ def generate_range(start=None, end=None, periods=None,
24702469
end : datetime (default None)
24712470
periods : int, (default None)
24722471
offset : DateOffset, (default BDay())
2473-
time_rule : (legacy) name of DateOffset object to be used, optional
2474-
Corresponds with names expected by tseries.frequencies.get_offset
24752472
24762473
Notes
24772474
-----
24782475
* This method is faster for generating weekdays than dateutil.rrule
24792476
* At least two of (start, end, periods) must be specified.
24802477
* If both start and end are specified, the returned dates will
24812478
satisfy start <= date <= end.
2482-
* If both time_rule and offset are specified, time_rule supersedes offset.
24832479
24842480
Returns
24852481
-------
24862482
dates : generator object
2487-
24882483
"""
2489-
if time_rule is not None:
2490-
from pandas.tseries.frequencies import get_offset
2491-
2492-
offset = get_offset(time_rule)
2484+
from pandas.tseries.frequencies import to_offset
2485+
offset = to_offset(offset)
24932486

24942487
start = to_datetime(start)
24952488
end = to_datetime(end)

0 commit comments

Comments
 (0)