Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
starttime = pd.Series(["2145-11-02 06:00:00"]).astype("datetime64[ns]")
endtime = pd.Series(["2145-11-02 07:06:00"]).astype("datetime64[ns]")
diff = endtime - starttime
assert diff.values.item() == 3960000000000
a = (endtime - starttime).dt.total_seconds().values
b = (endtime - starttime).values.astype(int) / 1_000_000_000
c = (endtime - starttime).values / np.timedelta64(1, "s")
assert b == c, f"{c-b}" # ✔
assert a == c, f"{a-c}" # ✘ AssertionError: [4.54747351e-13]
Issue Description
I noticed this when I was trying to reproduce a preprocessing pipeline for some dataset. (Don't mind the weird dates, they just come from some de-identified data).
It seems that dt.total_seconds
yields a too large value, probably due to a rounding issue.
In this example,
starttime = 5_548_888_800_000_000_000
endtime = 5_548_892_760_000_000_000
diff = 3_960_000_000_000
Since 1_000_000_000
divides the diff, the result should be precisely 3960
seconds, which is exactly representable as a float, however the dt.total_seconds
seems to accidentally round up:
np.set_printoptions(100)
print(np.frexp(a)) # 0.9667968750000001
print(np.frexp(b)) # 0.966796875
print(np.frexp(c)) # 0.966796875
However, curiously:
endtime= pd.Timestamp("2145-11-02 07:06:00")
starttime = pd.Timestamp("2145-11-02 06:00:00")
np.frexp( (endtime - starttime).total_seconds() ) # 0.966796875
So the issue might be related to the .dt
?
Expected Behavior
It should agree with the numpy result.
Installed Versions
INSTALLED VERSIONS
commit : ca60aab
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.0-40-generic
Version : #45~20.04.1-Ubuntu SMP Mon Apr 4 09:38:31 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.4
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 65.3.0
pip : 22.2.2
Cython : 0.29.32
pytest : 7.1.3
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli :
fastparquet : 0.8.3
fsspec : 2022.7.1
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.3
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : 1.4.40
tables : 3.7.0
tabulate : 0.8.10
xarray : 2022.6.0
xlrd : None
xlwt : None
zstandard : None