Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
dates = pd.date_range(start=pd.to_datetime("2018-01-01"), end=pd.to_datetime("2018-01-31"))
period_index = dates.to_period(freq="W")
# 2018-01-01
period_label_1 = period_index.astype("datetime64[ns]")[0]
# 2018-01-07
period_label_2 = period_index.asfreq('D').astype("datetime64[ns]")[0]
# I'm expecting that these values should be the same. However, it is not.
assert period_label_1 == period_label_2
Issue Description
Let's imagine you have a column with dates:
dates = pd.date_range(start=pd.to_datetime("2018-01-01"), end=pd.to_datetime("2018-01-31"))
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
....
'2018-01-29', '2018-01-30', '2018-01-31'],
dtype='datetime64[ns]', freq='D')
We want to convert them into a week period, with each date replaced with the period label. We will use the default frequency W
, with each date replaced with the period ending on the preceding Sunday:
dates | Expected Week Period |
---|---|
2018-01-01 | 2018-01-07 |
2018-01-02 | 2018-01-07 |
2018-01-03 | 2018-01-07 |
... | ... |
2018-01-29 | 2018-02-04 |
2018-01-30 | 2018-02-04 |
2018-01-31 | 2018-02-04 |
Let's create a PeriodIndex
where each date value is replaced with Period
object
period_index = dates.to_period(freq="W")
PeriodIndex(['2018-01-01/2018-01-07', '2018-01-01/2018-01-07',
...
'2018-01-29/2018-02-04'],
dtype='period[W-SUN]')
Then we would like to take only one day instead of PeriodIndex
. The simplest solution is to convert the column type into datetime64[ns]
:
period_index.astype("datetime64[ns]")[0]
However, an unexpected day appears. Instead of period end 2018-01-07
, pandas returns period start:
Timestamp('2018-01-01 00:00:00')
Expected Behavior
There is a trick that helps to fix this behavior. Firstly, we convert the Period
values back to daily frequency and then transform into datetime64[ns]
:
period_index.asfreq('D').astype("datetime64[ns]")[0]
Timestamp('2018-01-07 00:00:00')
Such behavior should be documented or/and the warning should be raised that direct conversion of the PeriodIndex
into "datetime64[ns]"
is not aligned with expected behavior.
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.11.7
python-bits : 64
OS : Linux
OS-release : 5.15.167.4-microsoft-standard-WSL2
Version : #1 SMP Tue Nov 5 00:21:55 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 1.26.3
pytz : 2025.1
dateutil : 2.8.2
pip : 23.2.1
Cython : None
sphinx : None
IPython : 8.20.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.12.2
html5lib : None
hypothesis : None
gcsfs : 2023.12.2post1
jinja2 : 3.1.3
lxml.etree : None
matplotlib : 3.8.2
numba : 0.60.0
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 14.0.2
pyreadstat : None
pytest : 7.4.4
python-calamine : None
pyxlsb : None
s3fs : 2023.12.2
scipy : 1.12.0
sqlalchemy : 2.0.29
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None