Description
I'm a huge fan of Pandas. Thanks for all the hard work!
I believe I have stumbled across a small bug in Pandas 0.17.1 which was not present in 0.16.2. Indexing into Series of timezone-aware datetime64
s fails using __getitem__
but indexing succeeds if the datetime64
s are timezone-naive. Here is a minimal code example and the exception produced by Pandas 0.17.1:
In [37]: dates_with_tz = pd.date_range("2011-01-01", periods=3, tz="US/Eastern")
In [46]: dates_with_tz
Out[46]:
DatetimeIndex(['2011-01-01 00:00:00-05:00', '2011-01-02 00:00:00-05:00',
'2011-01-03 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
In [38]: s_with_tz = pd.Series(dates_with_tz, index=['a', 'b', 'c'])
In [39]: s_with_tz
Out[39]:
a 2011-01-01 00:00:00-05:00
b 2011-01-02 00:00:00-05:00
c 2011-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern]
In [40]: s_with_tz['a']
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-40-81d0bf655282> in <module>()
----> 1 s_with_tz['a']
/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in __getitem__(self, key)
555 def __getitem__(self, key):
556 try:
--> 557 result = self.index.get_value(self, key)
558
559 if not np.isscalar(result):
/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in get_value(self, series, key)
1778 s = getattr(series,'_values',None)
1779 if isinstance(s, Index) and lib.isscalar(key):
-> 1780 return s[key]
1781
1782 s = _values_from_object(series)
/usr/local/lib/python2.7/dist-packages/pandas/tseries/base.pyc in __getitem__(self, key)
98 getitem = self._data.__getitem__
99 if np.isscalar(key):
--> 100 val = getitem(key)
101 return self._box_func(val)
102 else:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
If the dates are timezone-aware then we can access them using loc
but, as far as I'm aware, we should be able to use __getitem__
in this situation too:
In [41]: s_with_tz.loc['a']
Out[41]: Timestamp('2011-01-01 00:00:00-0500', tz='US/Eastern')
However, if the dates are timezone-naive then indexing using __getitem__
works as expected:
In [32]: dates_naive = pd.date_range("2011-01-01", periods=3)
In [33]: dates_naive
Out[33]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq='D')
In [34]: s = pd.Series(dates_naive, index=['a', 'b', 'c'])
In [35]: s
Out[35]:
a 2011-01-01
b 2011-01-02
c 2011-01-03
dtype: datetime64[ns]
In [36]: s['a']
Out[36]: Timestamp('2011-01-01 00:00:00')
So indexing into a Series
using __getitem__
works if the data is a list of timezone-naive datetime64
s but indexing fails if the datetime64
s are timezone-aware.
In [47]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.2.0-23-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
pandas: 0.17.1
nose: 1.3.7
pip: 1.5.6
setuptools: 15.2
Cython: 0.23.1
numpy: 1.10.1
scipy: 0.16.1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.2
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)
Jinja2: None