Skip to content

Commit 9d1d1b1

Browse files
authored
BUG: to_datetime re-parsing Arrow-backed objects (#53301)
* BUG: to_datetime re-parsing Arrow-backed objects * Address code comments * address code review * fix test * fix test * update * xfail on windows * skip on windows instead * fix mypy * fix * remove accidental
1 parent e0c3a98 commit 9d1d1b1

File tree

3 files changed

+51
-2
lines changed

3 files changed

+51
-2
lines changed

doc/source/whatsnew/v2.1.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -385,14 +385,14 @@ Datetimelike
385385
- :meth:`DatetimeIndex.map` with ``na_action="ignore"`` now works as expected. (:issue:`51644`)
386386
- Bug in :class:`DateOffset` which had inconsistent behavior when multiplying a :class:`DateOffset` object by a constant (:issue:`47953`)
387387
- Bug in :func:`date_range` when ``freq`` was a :class:`DateOffset` with ``nanoseconds`` (:issue:`46877`)
388+
- Bug in :func:`to_datetime` converting :class:`Series` or :class:`DataFrame` containing :class:`arrays.ArrowExtensionArray` of ``pyarrow`` timestamps to numpy datetimes (:issue:`52545`)
388389
- Bug in :meth:`DataFrame.to_sql` raising ``ValueError`` for pyarrow-backed date like dtypes (:issue:`53854`)
389390
- Bug in :meth:`Timestamp.date`, :meth:`Timestamp.isocalendar`, :meth:`Timestamp.timetuple`, and :meth:`Timestamp.toordinal` were returning incorrect results for inputs outside those supported by the Python standard library's datetime module (:issue:`53668`)
390391
- Bug in :meth:`Timestamp.round` with values close to the implementation bounds returning incorrect results instead of raising ``OutOfBoundsDatetime`` (:issue:`51494`)
391392
- Bug in :meth:`arrays.DatetimeArray.map` and :meth:`DatetimeIndex.map`, where the supplied callable operated array-wise instead of element-wise (:issue:`51977`)
392393
- Bug in constructing a :class:`Series` or :class:`DataFrame` from a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (:issue:`52212`)
393394
- Bug in parsing datetime strings with weekday but no day e.g. "2023 Sept Thu" incorrectly raising ``AttributeError`` instead of ``ValueError`` (:issue:`52659`)
394395

395-
396396
Timedelta
397397
^^^^^^^^^
398398
- :meth:`TimedeltaIndex.map` with ``na_action="ignore"`` now works as expected (:issue:`51644`)

pandas/core/tools/datetimes.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,10 @@
5454
is_list_like,
5555
is_numeric_dtype,
5656
)
57-
from pandas.core.dtypes.dtypes import DatetimeTZDtype
57+
from pandas.core.dtypes.dtypes import (
58+
ArrowDtype,
59+
DatetimeTZDtype,
60+
)
5861
from pandas.core.dtypes.generic import (
5962
ABCDataFrame,
6063
ABCSeries,
@@ -68,6 +71,7 @@
6871
)
6972
from pandas.core import algorithms
7073
from pandas.core.algorithms import unique
74+
from pandas.core.arrays import ArrowExtensionArray
7175
from pandas.core.arrays.base import ExtensionArray
7276
from pandas.core.arrays.datetimes import (
7377
maybe_convert_dtype,
@@ -402,6 +406,25 @@ def _convert_listlike_datetimes(
402406
arg = arg.tz_convert(None).tz_localize("utc")
403407
return arg
404408

409+
elif isinstance(arg_dtype, ArrowDtype) and arg_dtype.type is Timestamp:
410+
# TODO: Combine with above if DTI/DTA supports Arrow timestamps
411+
if utc:
412+
# pyarrow uses UTC, not lowercase utc
413+
if isinstance(arg, Index):
414+
arg_array = cast(ArrowExtensionArray, arg.array)
415+
if arg_dtype.pyarrow_dtype.tz is not None:
416+
arg_array = arg_array._dt_tz_convert("UTC")
417+
else:
418+
arg_array = arg_array._dt_tz_localize("UTC")
419+
arg = Index(arg_array)
420+
else:
421+
# ArrowExtensionArray
422+
if arg_dtype.pyarrow_dtype.tz is not None:
423+
arg = arg._dt_tz_convert("UTC")
424+
else:
425+
arg = arg._dt_tz_localize("UTC")
426+
return arg
427+
405428
elif lib.is_np_dtype(arg_dtype, "M"):
406429
if not is_supported_unit(get_unit_from_dtype(arg_dtype)):
407430
# We go to closest supported reso, i.e. "s"

pandas/tests/tools/test_to_datetime.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -933,6 +933,32 @@ def test_to_datetime_dtarr(self, tz):
933933
result = to_datetime(arr)
934934
assert result is arr
935935

936+
# Doesn't work on Windows since tzpath not set correctly
937+
@td.skip_if_windows
938+
@pytest.mark.parametrize("arg_class", [Series, Index])
939+
@pytest.mark.parametrize("utc", [True, False])
940+
@pytest.mark.parametrize("tz", [None, "US/Central"])
941+
def test_to_datetime_arrow(self, tz, utc, arg_class):
942+
pa = pytest.importorskip("pyarrow")
943+
944+
dti = date_range("1965-04-03", periods=19, freq="2W", tz=tz)
945+
dti = arg_class(dti)
946+
947+
dti_arrow = dti.astype(pd.ArrowDtype(pa.timestamp(unit="ns", tz=tz)))
948+
949+
result = to_datetime(dti_arrow, utc=utc)
950+
expected = to_datetime(dti, utc=utc).astype(
951+
pd.ArrowDtype(pa.timestamp(unit="ns", tz=tz if not utc else "UTC"))
952+
)
953+
if not utc and arg_class is not Series:
954+
# Doesn't hold for utc=True, since that will astype
955+
# to_datetime also returns a new object for series
956+
assert result is dti_arrow
957+
if arg_class is Series:
958+
tm.assert_series_equal(result, expected)
959+
else:
960+
tm.assert_index_equal(result, expected)
961+
936962
def test_to_datetime_pydatetime(self):
937963
actual = to_datetime(datetime(2008, 1, 15))
938964
assert actual == datetime(2008, 1, 15)

0 commit comments

Comments
 (0)