-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
API: avoid silent consolidation #49456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -6812,8 +6807,6 @@ def fillna( | |||
inplace = validate_bool_kwarg(inplace, "inplace") | |||
value, method = validate_fillna_kwargs(value, method) | |||
|
|||
self._consolidate_inplace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it was more performant to consolidate a fragmented frame before fillna or replace?
Nonetheless, I agree with the notion of "avoiding side effects" when calling a method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any performance differences in the methods where consolidation no longer happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the main one is going to be in DataFrame.values where subsequent calls could be faster in the status quo if it can be fully consolidated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fairly good. Could you post some before/after timings of a fragmented frame calling reindex
, fillna
, and replace
?
We take a pretty big hit in
|
@jreback thoughts on the above performance impact of not silently consolidating? |
@jbrockmendel can u run the above again with a much larger number of rows to see how this amortizes |
With a million rows this is faster across the board:
|
+1 then i suppose should run a full asv just to see where it is slower (likely a concat or similar) |
I don't for the life of me understand how many of these benchmarks are affected...
|
Guessing (hoping) those are mostly noisy benchmarks. Maybe spot-check a few of the bad ones (like 1 benchmark per general "benchmark topic") with |
Doing these on the first few shows perf unchanged vs main. Maybe I hadn't rebased when running asv before? Anyway, re-running some of the asvs now. |
re-ran everything that previously showed as over 1.40 and they all came back as unchanged, except for two groupby.Nth series benchmarks that came back as faster and pretty much have to be noise |
lgtm would add a whatsnew note |
Thanks @jbrockmendel |
* API: avoid silent consolidation * update test * fix ArrayManager xfail * whatsnew
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.