Skip to content

BUG: df.astype converts to datetime64[ns] inconsistently with respect to dayfirst #60964

Open
@cgflex

Description

@cgflex

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({"some_dates": ["1/1/2025","12/1/2025","13/1/2025","1/12/2025","11/12/2025","13/12/2025",]})
df["converted_dates"] = df.astype({"some_dates": "datetime64[ns]"})
print(df)

# output:
#    some_dates converted_dates
# 0    1/1/2025      2025-01-01
# 1   12/1/2025      2025-12-01 )
# 2   13/1/2025      2025-01-13 ) <- converted_date reverses day and month
# 3   1/12/2025      2025-01-12
# 4  11/12/2025      2025-11-12
# 5  13/12/2025      2025-12-13

Issue Description

When converting dates using astype, dates that are valid monthfirst dates (eg 1 Dec 2025) are interpreted as such. If a date is not valid monthfirst (13 Jan 2025) but it is valid dayfirst then the individual line is interpreted as a dayfirst field.

There was a comment by @MarcoGorelli here: #53127 (comment) that disallowing converting string dates with astype('datetime64[ns]') might be a good idea and after a morning debugging this I'm inclined to agree!

Expected Behavior

In general, I would expect a column of data to have a consistent interpretation. It should be an error or at least a warning for different rows to be interpreted differently without an explicit user request.

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.11.9
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.26100
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United Kingdom.1252

pandas : 2.2.3
numpy : 2.1.3
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : 8.12.3
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.36
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : 3.2.0
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None
None

Metadata

Metadata

Assignees

Labels

BugDeprecateFunctionality to remove in pandasdatetime.datestdlib datetime.date support

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions