Skip to content

GroupBy[Series].count() return type should be Series[int] #966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion pandas-stubs/core/groupby/groupby.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,10 @@ class GroupBy(BaseGroupBy[NDFrameT]):
@overload
def all(self: GroupBy[DataFrame], skipna: bool = ...) -> DataFrame: ...
@final
def count(self) -> NDFrameT: ...
@overload
def count(self: GroupBy[Series]) -> Series[int]: ...
@overload
def count(self: GroupBy[DataFrame]) -> DataFrame: ...
Comment on lines +179 to +182
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer removing def count() from here, and having declarations in DataFrameGroupBy and SeriesGroupBy in core/groupby/generic.pyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a new contributor to this repo and eager to learn more about these choices. Is there a specific reason for moving the declarations to core/groupby/generic.pyi?

In the actual pandas repo, the count() method is implemented in core/groupby/groupby.py, not core/groupby/generic.py.

Thanks in advance for taking the time to explain.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a new contributor to this repo and eager to learn more about these choices. Is there a specific reason for moving the declarations to core/groupby/generic.pyi?

In the actual pandas repo, the count() method is implemented in core/groupby/groupby.py, not core/groupby/generic.py.

Thanks in advance for taking the time to explain.

It probably is due to that we had issues in the past with doing overloads where self was declared as Series vs. DataFrame, so putting the stubs in the lower classes was a way to get around that problem.

The type checkers have improved, so it's possible your approach could be used going forward.

Having said that, I see that for any() and all() in the file you changed, that was the pattern used, so maybe your change was fine the way it was!

As an example, there are a number of methods shared between Series and DataFrame that are implemented in NDFrame, but instead of putting the stub in NDFrame, we put them in Series and DataFrame to clarify that the results are different in the two subclasses.

@final
def mean(
self,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ mypy = "1.10.1"
pandas = "2.2.2"
pyarrow = ">=10.0.1"
pytest = ">=7.1.2"
pyright = ">=1.1.369"
pyright = ">= 1.1.374"
poethepoet = ">=0.16.5"
loguru = ">=0.6.0"
typing-extensions = ">=4.4.0"
Expand Down
1 change: 1 addition & 0 deletions tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1025,6 +1025,7 @@ def test_types_groupby_methods() -> None:
check(assert_type(df.groupby("col1").sum(), pd.DataFrame), pd.DataFrame)
check(assert_type(df.groupby("col1").prod(), pd.DataFrame), pd.DataFrame)
check(assert_type(df.groupby("col1").sample(), pd.DataFrame), pd.DataFrame)
check(assert_type(df.groupby("col1").count(), pd.DataFrame), pd.DataFrame)
check(
assert_type(df.groupby("col1").value_counts(normalize=False), "pd.Series[int]"),
pd.Series,
Expand Down
7 changes: 7 additions & 0 deletions tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,6 +676,13 @@ def test_types_groupby_methods() -> None:
check(assert_type(s.groupby(level=0).idxmax(), pd.Series), pd.Series)
check(assert_type(s.groupby(level=0).idxmin(), pd.Series), pd.Series)

s2 = pd.Series(["w", "x", "y", "z"], index=[3, 4, 3, 4], dtype=str)
check(
assert_type(s2.groupby(level=0).count(), "pd.Series[int]"),
pd.Series,
np.integer,
)


def test_groupby_result() -> None:
# GH 142
Expand Down
Loading