Skip to content

tricky timestamp conversion #25571

Closed
Closed
@randomgambit

Description

@randomgambit

Hello there, its me the bug hunter again :)

I have this massive 200 million rows dataset, and I encountered some very annoying behavior. I wonder if this is a bug.

I load my csv using

mylog = pd.read_csv('/mydata.csv',
                    names = ['mydatetime',  'var2', 'var3', 'var4'],
                    dtype = {'mydatetime' : str},
                    skiprows = 1)

and the datetime column really look like regular timestamps (tz aware)

mylog.mydatetime.head()
Out[22]: 
0    2019-03-03T20:58:38.000-0500
1    2019-03-03T20:58:38.000-0500
2    2019-03-03T20:58:38.000-0500
3    2019-03-03T20:58:38.000-0500
4    2019-03-03T20:58:38.000-0500
Name: mydatetime, dtype: object

Now, I take extra care in converting these string into proper timestamps:

mylog['mydatetime'] = pd.to_datetime(mylog['mydatetime'] ,errors = 'coerce', format = '%Y-%m-%dT%H:%M:%S.%f%z', infer_datetime_format = True, cache = True)

That takes a looong time to process, but seems OK. The output is

mylog.mydatetime.head()
Out[23]: 
0    2019-03-03 20:58:38-05:00
1    2019-03-03 20:58:38-05:00
2    2019-03-03 20:58:38-05:00
3    2019-03-03 20:58:38-05:00
4    2019-03-03 20:58:38-05:00
Name: mydatetime, dtype: object

What is puzzling is that so far I thought I had full control of my dtypes. However, running the simple

mylog['myday'] = pd.to_datetime(mylog['mydatetime'].dt.date, errors = 'coerce')

  File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

The only way I was able to go past this error was by running

mylog['myday'] = pd.to_datetime(mylog['mydatetime'].apply(lambda x: x.date()))

Is this a bug? Before upgrading to 24.1 I was not getting the tz error above. What do you think? I cant share the data but I am happy to try some things to help you out!

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeNeeds InfoClarification about behavior needed to assess issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions