Closed
Description
When a DataFrame column is constructed from timezone-aware datetime objects, its values
attribute returns a pandas.DatetimeIndex
instead of a 2D numpy array. This is problematic because the datetime index does not support all operations that a numpy array does.
Code Sample, a copy-pastable example if possible
import datetime
import dateutil
import pandas
import numpy as np
df = pandas.DataFrame()
df['Time'] = [datetime.datetime(2015,1,1,tzinfo=dateutil.tz.tzutc())]
df.dropna(axis=0) # raises ValueError: 'axis' entry is out of bounds
Also, print df.values
returns DatetimeIndex(['2015-01-01'], dtype='datetime64[ns, UTC]', freq=None)
.
Expected Output
The df.dropna
call should be a no-op.
Compare this to the case when constructed using df['Time'] = [datetime.datetime(2015,1,1)]
. In that case, df.dropna
works as expected, and df.values
is array([['2014-12-31T16:00:00.000000000-0800']], dtype='datetime64[ns]')
.
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: None
pip: 8.0.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.1
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None