Skip to content

Inconsistent behavior of DatetimeIndex Partial String Indexing on Series and DataFrames #14826

Closed
@ischurov

Description

@ischurov

This bugreport is related to this SO question and the discussion there.

Summary

I believe that current DatetimeIndex Partial String Indexing behavior is either inconsistent or underdocumented as the result depends nontrivially on whether we are working with Series or DataFrame and whether DateTimeIndex is periodic or not.

Series vs. DataFrame

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
                                                '2016-12-08 09:00:00',
                                                '2016-12-09 09:00:00']))
print(type(series["2016-12-07 09:00:00"]))
# <class 'numpy.int64'>
df = pd.DataFrame(series)
df["2016-12-07 09:00:00"]

KeyError: '2016-12-07 09:00:00'

Here we see that the behaviour depends on what we are indexing: Series returns scalar while DataFrame raises an exception. This exception is consistent with the documentation notice:

Warning The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)

Why we do not get the same exception for Series object?

Periodic vs. Non-periodic

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
                                                '2016-12-08 09:00:00',
                                                '2016-12-09 09:00:01']))
# now it is not periodic due to 1 second in the last timestamp

print(type(series["2016-12-07 09:00:00"]))
# <class 'pandas.core.series.Series'>

In contrast with the previous example, we get an instance of Series here, so the same timestamp is considered as a slice, not index. Why it depends in such a way on periodicity of the index?

df = pd.DataFrame(series)
print(type(df["2016-12-07 09:00:00"]))
# <class 'pandas.core.frame.DataFrame'>

No exceptions here, in contrast with periodic case.

Is it intended behavior? If yes, I believe that this should be clearly documented and rationale provided.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.1.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0+157.g2466ecb
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions