Skip to content

DataFrame.__setitem__ performance regression with object dtype #19299

Closed
@TomAugspurger

Description

@TomAugspurger

This is between 0.20.3 and 0.21.1. Haven't narrowed further yet. Note that

  • It's specific to setitem, getitem is fine
  • It's specific to object dtype
In [1]: import pandas as pd
   ...: pd.__version__
   ...:
Out[1]: '0.20.3'

In [2]: df = pd.DataFrame(index=range(1000), columns=range(100), dtype=object)

In [3]: %timeit df.loc[0, 1] = 1.0
132 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [1]: import pandas as pd
   ...: pd.__version__
   ...:
Out[1]: '0.21.1'

In [2]: df = pd.DataFrame(index=range(1000), columns=range(100), dtype=object)

In [3]: %timeit df.loc[0, 1] = 1.0
1.64 ms ± 70.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This is present on master. I haven't tried profiling yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BenchmarkPerformance (ASV) benchmarksIndexingRelated to indexing on series/frames, not to indexes themselvesgood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions