Description
Calling pd.to_datetime
with the unit='s'
kwarg appears to be 1000x slower for float64 than for int64. There does not appear to be a difference in performance between the two types if the timestamps are first converted to nanoseconds and no unit is specified.
timestamp_seconds_int = pd.Series(np.random.randint(1521685107 - 604800, 1521685107, 1000000, dtype='int64'))
timestamp_seconds_float = timestamp_seconds_int.astype('float64')
%%timeit -r 3
pd.to_datetime(timestamp_seconds_int, unit='s')
Output: 12.4 ms ± 1.66 ms per loop (mean ± std. dev. of 3 runs, 100 loops each)
%%timeit -r 3
pd.to_datetime(timestamp_seconds_float, unit='s')
Output: 6.88 s ± 138 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0.dev0+658.g17c1fadb0
pytest: 3.0.6
pip: 9.0.3
setuptools: 38.5.2
Cython: 0.28.1
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None