Description
This bugreport is related to this SO question and the discussion there.
Summary
I believe that current DatetimeIndex Partial String Indexing behavior is either inconsistent or underdocumented as the result depends nontrivially on whether we are working with Series
or DataFrame
and whether DateTimeIndex
is periodic or not.
Series
vs. DataFrame
series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
'2016-12-08 09:00:00',
'2016-12-09 09:00:00']))
print(type(series["2016-12-07 09:00:00"]))
# <class 'numpy.int64'>
df = pd.DataFrame(series)
df["2016-12-07 09:00:00"]
KeyError: '2016-12-07 09:00:00'
Here we see that the behaviour depends on what we are indexing: Series
returns scalar while DataFrame
raises an exception. This exception is consistent with the documentation notice:
Warning The following selection will raise a KeyError; otherwise this selection methodology would be inconsistent with other selection methods in pandas (as this is not a slice, nor does it resolve to one)
Why we do not get the same exception for Series
object?
Periodic vs. Non-periodic
series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
'2016-12-08 09:00:00',
'2016-12-09 09:00:01']))
# now it is not periodic due to 1 second in the last timestamp
print(type(series["2016-12-07 09:00:00"]))
# <class 'pandas.core.series.Series'>
In contrast with the previous example, we get an instance of Series
here, so the same timestamp is considered as a slice, not index. Why it depends in such a way on periodicity of the index?
df = pd.DataFrame(series)
print(type(df["2016-12-07 09:00:00"]))
# <class 'pandas.core.frame.DataFrame'>
No exceptions here, in contrast with periodic case.
Is it intended behavior? If yes, I believe that this should be clearly documented and rationale provided.
Output of pd.show_versions()
pandas: 0.19.0+157.g2466ecb
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None