Skip to content

Not duplicates in multiIndex columns with duplicates not indexed properly when selected #4146

Closed
@hayd

Description

@hayd

This appears to be a regression since 0.11 in handling duplicates in MultiIndex columns:

In [11]: df
Out[11]:
  h1 main  h3 sub  h5
0  a    A   1  A1   1
1  b    B   2  B1   2
2  c    B   3  A1   3
3  d    A   4  B2   4
4  e    A   5  B2   5
5  f    B   6  A2   6

In [12]: df2 = df.set_index(['main', 'sub']).T.sort_index(1)

In [13]: df2
Out[13]:
main  A        B
sub  A1 B2 B2 A1 A2 B1
h1    a  d  e  c  f  b
h3    1  4  5  3  6  2
h5    1  4  5  3  6  2

If we grab out successively we get an unexpected result for the non-duplicate:

In [14]: df2['A']
Out[14]:
sub A1 B2 B2
h1   a  d  e
h3   1  4  5
h5   1  4  5

In [15]: df2['A']['B2']
Out[15]:
sub B2 B2
h1   d  e
h3   4  5
h5   4  5

In [16]: df2['A']['A1']  # this worked in 0.11
Out[16]:
   0
0  a
1  1
2  1
In [21]: df2['A']['A1']  # pandas 0.11
Out[21]:
h1    a
h3    1
h5    1
Name: A1, dtype: object

FWIW never like how this can return different a type...

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions