-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: TimeGrouper outputs different result by column order #6908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this close #6764 ? |
expected = DataFrame({ | ||
'Buyer': 'Carl Joe Mark Carl Joe'.split(), | ||
'Quantity': [6,8,3,4,10], | ||
'Date' : [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are all of the tests changed? (I know they are just moved) but very hard to tell of actual changes
best just to leave original tests unless small change and just add new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine what u did just FYI for future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intension was to perform exactly the same tests for both original and sorted data. But it is possible to prepare duplicate tests by copy & paste. Which is prefferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no this is fine
prefer this to copy/paste
It seems do. I've commented on #6764, and will add a test for this. |
just run a quick perf check - anything shows up post and investigate almost certainly fine - good to check anyhow |
Can u add in a release note I think their was a releated issue u can simply add this onto (for pd.Grouper) (or after) |
Thanks, I've added release note and test for #6764. My initial implementation had unnecessary sort and it makes time series performance worse. Now it is fixed and following is an updates vbench result. No test looks consntantly get worse.
|
BUG: TimeGrouper outputs different result by column order
closes #6764
TimeGrouper
may output incorrect results depending on the target column order. The problem seems to be caused by 2 parts.TimeGrouper._get_time_bins
and related methods expects sorted values input.BinGrouper.get_iterator
expects sorted data input.