Skip to content

API/PERF: when to check for mismatched tzs/awareness in array_to_datetime #55779

Open
@jbrockmendel

Description

@jbrockmendel
ts = pd.Timestamp("2016-01-01", tz="UTC")
ts2 = ts.tz_convert("US/Pacific")

>>> pd.to_datetime([ts, ts2])
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True, at position 1

>>> pd.to_datetime([ts.isoformat(), ts2.isoformat()], format="mixed")
<stdin>:1: FutureWarning: In a future version of pandas, parsing datetimes with mixed time zones will raise an error unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour and silence this warning. To create a `Series` with mixed offsets and `object` dtype, please use `apply` and `datetime.datetime.strptime`
Index([2016-01-01 00:00:00+00:00, 2015-12-31 16:00:00-08:00], dtype='object')

If we pass mixed-tz datetime objects, we do a tz-match check at each step of the loop inside array_to_datetime/array_strptime (specifically in state.process_datetime). If we pass mixed-tz strings, the analogous check happens outside the loop. (per #55693 we currently dont have mixed-type checks)

Eventually these checks should be shared, which means we need to decide on the in-loop or after-loop versions. Three differences for users are

  1. the in-loop version adds the f"at position {i}" to the exception message.
  2. the in-loop version can be handled differently based on errors=coerce/ignore
  3. the in-loop version is in-loop and so presumably incurs a performance penalty

The errors=coerce/ignore part is the API part of the issue (though xref #54467 for deprecating ignore). I think it is very likely that the original intent of coerce was to handle invalid individual items, not invalid combinations of items, so would be OK with the API change that would come with moving this outside the loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignConstructorsSeries/DataFrame/Index/pd.array ConstructorsDatetimeDatetime data dtypePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions