Description
Consider a DF or Series with a MultiIndex M1, M2, M3.
When grouping as in
grp = df.groupby(level=["M1", "M3"])
You will get an error like
Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
Type "list[str]" cannot be assigned to type "Level | None"
"list[str]" is incompatible with protocol "Hashable"
This is due to the annotation of groupby as
by: IntervalIndex[IntervalT],
axis: AxisIndex = ...,
level: Level | None = ...,
The type "Level" is an alias for "Hashable" and allows only one level. I think it should be Hashable|Sequence[Hashable]
or, as pandas calls it, IndexLabel
(+ the | None
)
I've identified a few places (see attachment). However, I am not sure the patch is correct/complete. It is interesting to see that stack
is annotated with level: Level | list[Level] = ...
while unstack
has level: Level = ...
while IMHO both should be IndexLabel
.
Also, I am not sure the groupby annotations are completey correct because I think you cannot use df.groupby(by=not_None, level=other_not_None)
at the same time, yet the annotations seems to allow it.
Minimum example code:
import pandas as pd
df = pd.DataFrame(
data={"a": [0]},
index=pd.MultiIndex.from_product(
[pd.Index(["i1a"], name="i1"), pd.Index(["i2a"], name="i2")]
),
)
grp = df.groupby(level=["i1", "i2"])
pyright outputs:
pyright pd-stub-test.py
pd-stub-test.py
pd-stub-test.py:9:24 - error: Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
Type "list[str]" cannot be assigned to type "Level | None"
"list[str]" is incompatible with protocol "Hashable"
"__hash__" is an incompatible type
Type "None" cannot be assigned to type "(self: list[str]) -> int"
"list[str]" is incompatible with "int"
Type cannot be assigned to type "None" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations
0001-MultiIndex-in-groupby.patch.gz
(Warning: I haven't tested/verified the patch!)