-
-
Notifications
You must be signed in to change notification settings - Fork 143
groupby.__iter__() fix types #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
1ab02f5
groupby.__iter__() fix types
Dr-Irv 62fd358
WIP: try splitting by label and otherwise
Dr-Irv 167ba50
fix tests to avoid cast
Dr-Irv 383e1bf
differentiate by list or scalar
Dr-Irv 981bf9a
make new classes private. Change tests to test iterator and next
Dr-Irv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion: I see the point from a user-perspective, but I think deviating too much from the pandas implementation will make it challenging to use stubtest later (and maybe also maintaining the stubs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably make these classes that are only in the stubs but not in pandas private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I develop a slightly stronger opinion. I would prefer not to introduce too many non-pandas classes: I believe that this will keep pandas-stubs more maintainable in the long-run.
Personally, I prefer the previous state of the PR where you returned
tuple[Hashable, NDFrameT]
(or making SeriesGroupBy generic, might be more invasive).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tough call. There are lots of things in pandas that are designed without static typing taken into consideration. In this case, we want to differentiate when someone writes
df.groupby(["a", "b"])
versusdf.groupby("a")
so that__iter__()
returns a different result. The implementation has__iter__()
in a base class, so there isn't an easy way from a static typing perspective to differentiate between the different kinds of arguments ofgroupby()
when you get down to the base class.My philosophy on the stubs has been to make it useful for end users with the most common ways of using pandas. To do that, we have to deviate from the implementation. A good example of this is
Series.dt
, where the accessors are all dynamically hooked in, but we have to create static declarations for each of the accessors. I think thisgroupby()
example is similar.I tried some experiments making
DataFrameGroupBy
generic, but the types that are then returned by__iter__()
become too wide. Now we just make the return types eitherIterator[Tuple[Tuple, DataFrame]]
orIterator[Tuple[Scalar, DataFrame]]
, which covers the majority of use cases.My suggestion is you accept this PR as it is - it addresses the issue reported, and we could create a new issue to see if we can figure out a way to make a generic implementation work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that in the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I prefer pandas and pandas-stubs to be aligned (I would like if pandas and pandas-stub could in the future converge and only be different in challenging cases:
__getitem__, Timestamp, ...
- might be a long list) I'm fine with introducing classes/variables that are not in pandas as long as they are private (or some other prefix).