Skip to content

API: Please make ".loc" return type depend on index, not on specific labels #9519

Open
@toobaz

Description

@toobaz

I already mentioned this in #9466 but I think it deserves its own bug report:

In [2]: s = pd.Series([1, 2, 3], index=(1,1,2))

In [3]: s
Out[3]: 
1    1
1    2
2    3
dtype: int64

In [4]: s.loc[1]
Out[4]: 
1    1
1    2
dtype: int64

In [5]: type(s.loc[1])
Out[5]: pandas.core.series.Series

In [6]: s.loc[2]
Out[6]: 3

In [7]: type(s.loc[2])
Out[7]: numpy.int64

Quoting #5678 , "You are selecting out of a duplicated index Series. You could argue that you should get back another Series"

I really think life would be easier if s.loc[2] returned a Series of length one (and DataFrames and Panels behaved similarly). One is assumed to know (and can check in O(1)) if an index is unique, but maybe not if a given label is unique.

With higher dimensions structures it's even more messy because if e.g. .loc[lab_a, lab_b, lab_c] yields a lower dimension structure, but still a pandas structure, you have to find out which dimensions have been lost/kept (i.e. which of the labels were duplicates).

I don't think I have the skills to propose a PR, but I would volunteer to fix the broken tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions