Skip to content

Fix issue with Series.loc[x] where x is a tuple specifiying a specific index to get from a MultiIndex #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 2, 2022

Conversation

Dr-Irv
Copy link
Collaborator

@Dr-Irv Dr-Irv commented Oct 1, 2022

Following code failed:

s = pd.Series(
        [1, 2, 3, 4], index=pd.MultiIndex.from_product([[1, 2], ["a", "b"]]), dtype=int
    )
assert_type(s.loc[1, "a"], int)

This PR fixes that. Issue is that when a Series is backed by a MultiIndex, you should be able to get specific values by specifying a complete tuple.

Added test in test_series.py:test_multiindex_loc

@Dr-Irv Dr-Irv requested a review from twoertwein October 1, 2022 00:23
def __getitem__( # type: ignore[misc]
self,
idx: Scalar | tuple[Scalar, ...],
) -> S1: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we want to distinguish having a tuple of just scalars, versus tuples that include slices or Index

Can you elaborate on which two cases this distunishes?

It might be nice to add inline comments which types/overloads are for index/multiindex.

I found two ways to use a tuple for a normal index, but I don't think this is an expected/typical usage:

> type(pd.Series([1]).__getitem__((0, None)))
numpy.ndarray
> pd.Series([1], index=[("a", "b")]).__getitem__(("a", "b"))
1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a MultiIndex, then you can specify a tuple of index values to get a specific element. You can use a tuple that contains a slice to get more than one value:

>>> s = pd.Series([1,2,3,4], index=pd.MultiIndex.from_product([[1,2], ["a","b"]], names=["nn","ab"]))
>>> s
nn  ab
1   a     1
    b     2
2   a     3
    b     4
dtype: int64
>>> s.loc[1,"b"]
2
>>> s.loc[pd.IndexSlice[1,:]]
ab
a    1
b    2
dtype: int64

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next commit adds comments, and uses _IndexSliceTuple. Note also that s.loc[1,:] works in example above.

@twoertwein twoertwein merged commit bd2d98f into pandas-dev:main Oct 2, 2022
@twoertwein
Copy link
Member

Thank you @Dr-Irv !

Do you think it would be helpful to make Series (and DataFrame) generic in regards to their index to help with such overloads?

@Dr-Irv
Copy link
Collaborator Author

Dr-Irv commented Oct 2, 2022

Do you think it would be helpful to make Series (and DataFrame) generic in regards to their index to help with such overloads?

We could certainly try. We'd have to track the Index versus MultiIndex in __new__() and also modify stubs for things like set_index() and reset_index(). Might be tricky with MultiIndex a subclass of Index, and all the other Index subclasses.

To some extent, part of the issue here with mypy is that if you have 2 overloads that overlap, but the first overload is a subset of the second overload, you know the first one will match, and the second one is for the cases that don't match the first one. pyright handles this just fine.

@twoertwein
Copy link
Member

Might be tricky with MultiIndex a subclass of Index

I forgot that for a moment - yes that would create many overlapping overloads (I believe also in pyright). As long as all type checkers pick the first matching overload, it can be fine to use overlapping overloads deliberately.

@Dr-Irv Dr-Irv deleted the serieslocmulti branch December 28, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants