Skip to content

BUG: df.apply handles np.timedelta64 as timestamp, should be timedelta #7778

Closed
@stharrold

Description

@stharrold

I think there may be a bug with the row-wise handling of numpy.timedelta64 data types when using DataFrame.apply. As a check, the problem does not appear when using DataFrame.applymap. The problem may be related to #4532, but I'm unsure. I've included an example below.

This is only a minor problem for my use-case, which is cross-checking timestamps from a counter/timer card. I can easily work around the issue with DataFrame.itertuples etc.

Thank you for your time and for making such a useful package!

Example

Version

Import and check versions.

$ date
Thu Jul 17 16:28:38 CDT 2014
$ conda update pandas
Fetching package metadata: ..
# All requested packages already installed.
# packages in environment at /Users/harrold/anaconda:
#
pandas                    0.14.1               np18py27_0  
$ ipython
Python 2.7.8 |Anaconda 2.0.1 (x86_64)| (default, Jul  2 2014, 15:36:00) 
Type "copyright", "credits" or "license" for more information.

IPython 2.1.0 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from __future__ import print_function

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: pd.util.print_versions.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Darwin
OS-release: 11.4.2
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: 0.999
httplib2: 0.8
apiclient: 1.2
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
Create test data

Using subset of original raw data as example.

In [5]: datetime_start = np.datetime64(u'2014-05-31T01:23:19.9600345Z')

In [6]: timedeltas_elapsed = [30053400, 40053249, 50053098]

Compute datetimes from elapsed timedeltas, then create differential timedeltas from datetimes. All elements are either type numpy.datetime64 or numpy.timedelta64.

In [7]: df = pd.DataFrame(dict(datetimes = timedeltas_elapsed))

In [8]: df = df.applymap(lambda elt: np.timedelta64(elt, 'us'))

In [9]: df = df.applymap(lambda elt: np.datetime64(datetime_start + elt))

In [10]: df['differential_timedeltas'] = df['datetimes'] - df['datetimes'].shift()

In [11]: print(df)
                      datetimes  differential_timedeltas
0 2014-05-31 01:23:50.013434500                      NaT
1 2014-05-31 01:24:00.013283500          00:00:09.999849
2 2014-05-31 01:24:10.013132500          00:00:09.999849
Expected behavior

With element-wise handling using DataFrame.applymap, all elements are correctly identified as datetimes (timestamps) or timedeltas.

In [12]: print(df.applymap(lambda elt: type(elt)))
                          datetimes     differential_timedeltas
0  <class 'pandas.tslib.Timestamp'>  <type 'numpy.timedelta64'>
1  <class 'pandas.tslib.Timestamp'>  <type 'numpy.timedelta64'>
2  <class 'pandas.tslib.Timestamp'>  <type 'numpy.timedelta64'>
Bug

With row-wise handling using DataFrame.apply, all elements are type pandas.tslib.Timestamp. I expected 'differential_timedeltas' to be type numpy.timedelta64 or another type of timedelta, not a type of datetime (timestamp).

In [13]: # For 'datetimes':

In [14]: print(df.apply(lambda row: type(row['datetimes']), axis=1))
0    <class 'pandas.tslib.Timestamp'>
1    <class 'pandas.tslib.Timestamp'>
2    <class 'pandas.tslib.Timestamp'>
dtype: object

In [15]: # For 'differential_timedeltas':

In [16]: print(df.apply(lambda row: type(row['differential_timedeltas']), axis=1))
0      <class 'pandas.tslib.NaTType'>
1    <class 'pandas.tslib.Timestamp'>
2    <class 'pandas.tslib.Timestamp'>
dtype: object

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeDtype ConversionsUnexpected or buggy dtype conversionsTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions