Skip to content

GroupBy[Series].count() return type should be Series[int] #966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2024
Merged

GroupBy[Series].count() return type should be Series[int] #966

merged 4 commits into from
Aug 1, 2024

Conversation

chrisyeh96
Copy link
Contributor

Comment on lines +179 to +182
@overload
def count(self: GroupBy[Series]) -> Series[int]: ...
@overload
def count(self: GroupBy[DataFrame]) -> DataFrame: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing def count() from here, and having declarations in DataFrameGroupBy and SeriesGroupBy in core/groupby/generic.pyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a new contributor to this repo and eager to learn more about these choices. Is there a specific reason for moving the declarations to core/groupby/generic.pyi?

In the actual pandas repo, the count() method is implemented in core/groupby/groupby.py, not core/groupby/generic.py.

Thanks in advance for taking the time to explain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a new contributor to this repo and eager to learn more about these choices. Is there a specific reason for moving the declarations to core/groupby/generic.pyi?

In the actual pandas repo, the count() method is implemented in core/groupby/groupby.py, not core/groupby/generic.py.

Thanks in advance for taking the time to explain.

It probably is due to that we had issues in the past with doing overloads where self was declared as Series vs. DataFrame, so putting the stubs in the lower classes was a way to get around that problem.

The type checkers have improved, so it's possible your approach could be used going forward.

Having said that, I see that for any() and all() in the file you changed, that was the pattern used, so maybe your change was fine the way it was!

As an example, there are a number of methods shared between Series and DataFrame that are implemented in NDFrame, but instead of putting the stub in NDFrame, we put them in Series and DataFrame to clarify that the results are different in the two subclasses.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 31, 2024

We just uncovered an issue with our CI that was fixed earlier today. Can you merge with current main and push? Then I will start the CI runs.

@chrisyeh96
Copy link
Contributor Author

chrisyeh96 commented Aug 1, 2024

I have merged pandas-dev/main into my fork

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Aug 1, 2024

Ugh, so now CI is failing because when you first tested, it used pyright 1.1.373 and that got locked for further tests.

Can you include this in your PR? In pyproject.toml, change:

pyright = ">=1.1.369"

to

pyright = ">= 1.1.374"

@chrisyeh96
Copy link
Contributor Author

I have updated the pyright dependency version in pyproject.toml as you suggested.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @chrisyeh96

@Dr-Irv Dr-Irv merged commit 7e6aee4 into pandas-dev:main Aug 1, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GroupBy[Series].count() should have return type Series[int]
2 participants