Skip to content

BUG: inconsistent state of DatetimeIndex._data #20810

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Depending on how a DatetimeIndex is constructed, the underlying ._data attribute is a DatetimeIndex or a datetime64 ndarray:

In [1]: idx1 = pd.DatetimeIndex(start="2012-01-01", periods=3, freq='D') # date_range kind of construction

In [2]: idx1._data
Out[2]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq=None)

In [3]: idx2 = pd.DatetimeIndex(idx1)

In [4]: idx2._data
Out[4]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

I think this should always be a numpy array? (it clearly doesn't hurt, but I don't see any reason to have it sometimes as a DatetimeIndex)

This came out of fixing warnings in #20721, and is due to how _generate_regular_range and _simple_new on DatetimeIndex are implemented. From the code of _simple_new, I suspect that it assumes the input is always an ndarray and not DatetimeIndex, but in several places (like _generate_regular_range) an already constructed DatetimeIndex is passed to _simple_new.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCompatpandas objects compatability with Numpy or Python functionsDatetimeDatetime data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions