Closed
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this issue exists on the latest version of pandas.
-
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
rg = pd.date_range("2020-01-01", periods=100_000, freq="s")
ts_ns = pd.Timestamp("1996-01-01 00:00:00.00000000000")
ts_s = pd.Timestamp("1996-01-01")
Following timings:
%timeit rg < ts_s
2.27 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rg < ts_ns
108 µs ± 572 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I guess a bunch of users will define timestamps not up to the nanosecond and hence getting mismatched resolutions which causes a really big slowdown. Can we fix this somehow for 2.0?
Time is almost exclusively spent in
{pandas._libs.tslibs.np_datetime.compare_mismatched_resolutions}
cc @jbrockmendel @MarcoGorelli
Installed Versions
main
Prior Performance
No response