Skip to content

REF/PERF: Move MultiIndex._tuples to MultiIndex._cache #35641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Aug 9, 2020

Currently, the heavy-to-calculate MultiIndex.values attribute is cached in MultiIndex._tuples. It would be more dogmatic to store it in MultiIndex._cache IMO, which is what this PR does.

This has the added benefit of ._values getting copied over to new copies of the MultiIndex, so also gives a performance boost in cases where copying is needed:

>>> n = 100_000;
>>> df = pd.DataFrame({'a': ['a', 'b'] * int(n / 2), 'b': range(n), 'c': range(20, n + 20)})
>>> mi = pd.MultiIndex.from_frame(df)
>>> mi.values # also caches mi._values in mi._cache
array([('a', 0, 20), ('b', 1, 21), ('a', 2, 22), ...,
       ('b', 99997, 100017), ('a', 99998, 100018), ('b', 99999, 100019)],
      dtype=object)
>>> %timeit mi._shallow_copy().values
22.8 ms ± 3.43 ms per loop  # master
34.6 µs ± 997 ns per loop  # this PR

@topper-123 topper-123 changed the title REF/PERF: Move multi index tuples to cache REF/PERF: Move MultiIndex._tuples to MultiIndex._cache Aug 9, 2020
@topper-123 topper-123 force-pushed the move_multi_index_tuples_to_cache branch from 8cc8489 to 79e28b8 Compare August 9, 2020 12:52
@jreback jreback added Clean MultiIndex Performance Memory or execution speed performance labels Aug 10, 2020
@jreback jreback added this to the 1.2 milestone Aug 10, 2020
@jreback
Copy link
Contributor

jreback commented Aug 10, 2020

lgtm. can you rebase as merged your other MI PR; ping on green.

@topper-123 topper-123 force-pushed the move_multi_index_tuples_to_cache branch from 79e28b8 to ce1e3f4 Compare August 10, 2020 18:17
errors : str, default 'strict'
Specifies how encoding and decoding errors are to be handled.
See the errors argument for :func:`open` for a full list
of options.
Copy link
Contributor Author

@topper-123 topper-123 Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI complained that this doc string element is in a different order than in the signature, so I moved it to be after mode like in the signature, which must be correct.

I got no idea why this is failing here and not in other PRs...

@topper-123
Copy link
Contributor Author

Rebased.

@jreback jreback merged commit 32abe63 into pandas-dev:master Aug 10, 2020
@jreback
Copy link
Contributor

jreback commented Aug 10, 2020

thanks @topper-123

@topper-123 topper-123 deleted the move_multi_index_tuples_to_cache branch August 10, 2020 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants