Skip to content

Timestamp constructor parses ISO 8601 incorrectly near DST boundaries #8225

Closed
@ischwabacher

Description

@ischwabacher

Original post, edited to correct ISO 8601 formats. The originally titled bug does not exist. There is, however, some unexpected behavior:

It looks like short-form ISO 8601 parsing uses the right sign after all:

In [1]: import pandas as pd

In [2]: t = pd.Timestamp('2013-11-03 1:30:00', tz='America/Havana')

In [3]: t.strftime('%Y-%m-%d %H:%M:%S %Z %z')
Out[3]: '2013-11-03 01:30:00 CST -0500'

In [4]: t.strftime('%Y%m%dT%H%M%S%z')   # ISO 8601 short format
Out[4]: '20131103T013000-0500'

In [5]: pd.Timestamp(t.strftime('%Y%m%dT%H%M%S%z'))
Out[5]: Timestamp('2013-11-03 01:30:00-0500', tz='tzoffset(None, -18000)')

In [6]: t == _
Out[6]: True   # Not a bug after all!

In [7]: t.strftime('%Y-%m-%dT%H:%M:%S%z')   # ISO 8601 long format
Out[7]: '2013-11-03T01:30:00-0500'

In [8]: pd.Timestamp(t.strftime('%Y-%m-%dT%H:%M:%S%z'))
Out[8]: Timestamp('2013-11-03 01:30:00-0500', tz='pytz.FixedOffset(-300)')   # why pytz?

In [9]: t == _
Out[9]: True

In [10]: t.strftime('%Y%m%dT%H%M%SZ%z')   # Not a real ISO 8601 format (note the 'Z')
Out[10]: '20131103T013000Z-0500'

In [11]: pd.Timestamp(t.strftime('%Y%m%dT%H%M%SZ%z'))
Out[11]: Timestamp('2013-11-03 01:30:00+0500', tz='tzoffset(None, 18000)')
# This is the behavior that inspired the bug report.  I don't know if this is a valid parse
# or not, but it sure is unexpected.

It would be awfully embarrassing if I filed a bug report because I couldn't read ISO 8601...

However, in the process of investigating this issue, I encountered the following, which is definitely a bug:

In [12]: ts = ['2013-11-%s3 %s1:30:00' % (x, y) for x in ['', '0'] for y in ['', '0']]

In [13]: ts
Out[13]: 
['2013-11-3 1:30:00',
 '2013-11-3 01:30:00',
 '2013-11-03 1:30:00',
 '2013-11-03 01:30:00']   # only this one is ISO 8601

In [14]: tzs = ['America/%s' % s for s in ['Chicago', 'New_York', 'Havana']]

In [15]: [[pd.Timestamp(t, tz=tz) for t in ts] for tz in tzs]
Out[15]: 
[[Timestamp('2013-11-03 01:30:00-0600', tz='America/Chicago'),
  Timestamp('2013-11-03 01:30:00-0600', tz='America/Chicago'),
  Timestamp('2013-11-03 01:30:00-0600', tz='America/Chicago'),
  Timestamp('2013-11-03 01:30:00-0500', tz='America/Chicago')],   # DST
 [Timestamp('2013-11-03 01:30:00-0500', tz='America/New_York'),
  Timestamp('2013-11-03 01:30:00-0500', tz='America/New_York'),
  Timestamp('2013-11-03 01:30:00-0500', tz='America/New_York'),
  Timestamp('2013-11-03 01:30:00-0400', tz='America/New_York')],   # DST
 [Timestamp('2013-11-03 01:30:00-0500', tz='America/Havana'),
  Timestamp('2013-11-03 01:30:00-0500', tz='America/Havana'),
  Timestamp('2013-11-03 01:30:00-0500', tz='America/Havana'),
  Timestamp('2013-11-03 00:30:00-0500', tz='America/Havana')]]   # Just plain wrong

I'm not sure what's going on here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    TimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions