Skip to content

Serious performance regression in DataFrame construction with monthly DatetimeIndex #6479

Closed
@qwhelan

Description

@qwhelan

Hi,

After upgrading from v0.12.0 to v0.13.1, I noticed about a 100% slowdown on a pandas-heavy project. I've just started looking, but I've come up with a test case that shows a time-complexity change from O(1) to O(n) (~240x slowdown for my inputs).

Here's the comparison for v0.12.0 (y-axis is milliseconds):

perf_12

And the comparison for v0.13.1:

perf_131

The test code (I'll convert this to vbench later):

rows = 1000
columns = 10
data = DataFrame(np.random.random((rows, columns)), index=DatetimeIndex(start='1/1/1900', periods=rows, freq='M'))

d = {}

for col in data:
    d[col] = data[col]

%timeit DataFrame(d)

Daily indices don't appear to be affected, though I suspect other frequencies are impacted. I'm seeing similar regressions in v0.13.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrequencyDateOffsetsPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions