Skip to content

fix args to DataFrame.set_index #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions pandas-stubs/core/frame.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,9 @@ class DataFrame(NDFrame, OpsMixin):
@overload
def set_index(
self,
keys: Union[Label, Sequence],
keys: Union[
Label, Series, Index, np.ndarray, Iterator[Hashable], List[Hashable]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Iterator[Hashable] includes List[Hashable], but I didn't test that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Iterator[Hashable] includes List[Hashable], but I didn't test that.

No, it's not. Here's an example where pyright shows the incompatibility:

from typing import Iterator


def fun(a: Iterator[int]):
    k = next(a, None)
    while k is not None:
        print(k)


li = [1, 2, 3]
fun(li)

On the last line, pyright reports:

iterlist.py:11:5 - error: Argument of type "list[int]" cannot be assigned to parameter "a" of type "Iterator[int]" in function "fun"
    "list[int]" is incompatible with protocol "Iterator[int]"
      "__next__" is not present (reportGeneralTypeIssues)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterable[int] wouldn't require __next__ but I'm not sure whether set_index works with any iterable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterable[int] wouldn't require __next__ but I'm not sure whether set_index works with any iterable.

pandas has no type annotations for keys. Let's keep it as-is.

Copy link
Contributor

@bashtage bashtage Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you believe the docs, then the types that could be used are

Label
Series
Index
np.ndarray
Iterator
List[Union[Label, Series, Index, np.ndarray, Iterator]]

Clearly it also supports List everywhere Series is used.

],
drop: _bool = ...,
append: _bool = ...,
verify_integrity: _bool = ...,
Expand All @@ -587,7 +589,9 @@ class DataFrame(NDFrame, OpsMixin):
@overload
def set_index(
self,
keys: Union[Label, Sequence],
keys: Union[
Label, Series, Index, np.ndarray, Iterator[Hashable], List[Hashable]
],
drop: _bool = ...,
append: _bool = ...,
verify_integrity: _bool = ...,
Expand All @@ -597,7 +601,9 @@ class DataFrame(NDFrame, OpsMixin):
@overload
def set_index(
self,
keys: Union[Label, Sequence],
keys: Union[
Label, Series, Index, np.ndarray, Iterator[Hashable], List[Hashable]
],
drop: _bool = ...,
append: _bool = ...,
*,
Expand All @@ -606,7 +612,9 @@ class DataFrame(NDFrame, OpsMixin):
@overload
def set_index(
self,
keys: Union[Label, Sequence],
keys: Union[
Label, Series, Index, np.ndarray, Iterator[Hashable], List[Hashable]
],
drop: _bool = ...,
append: _bool = ...,
inplace: Optional[_bool] = ...,
Expand Down
2 changes: 2 additions & 0 deletions tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,8 @@ def test_types_set_index() -> None:
res4: pd.DataFrame = df.set_index("col1", verify_integrity=True)
res5: pd.DataFrame = df.set_index(["col1", "col2"])
res6: None = df.set_index("col1", inplace=True)
# GH 140
res7: pd.DataFrame = df.set_index(pd.Index(["w", "x", "y", "z"]))


def test_types_query() -> None:
Expand Down