Skip to content

BUG: surprising and possibly erroneous behavior of GroupBy.apply with an indexed series (index winds up duplicated) #35670

Closed
@aecay

Description

@aecay
  • I have checked that this issue has not already been reported.
    • I can't find anything in the bug tracker that matches the symptoms I'm reporting here, although it's a bit difficult to search (I'm not totally sure how to describe it)
  • I have confirmed this bug exists on the latest version of pandas.

Code Sample, a copy-pastable example

import pandas as pd

data = [{"label": l, "x" : x, "y": x + 1} for l in ("foo", "bar") for x in range(5)]
df = pd.DataFrame(data)
df = df.set_index(["label", "x"])
series = df["y"]
series2 = series.groupby(["label"]).apply(lambda s: s[2:])
print(series2.index)

# Output:

MultiIndex([('bar', 'bar', 2),
            ('bar', 'bar', 3),
            ('bar', 'bar', 4),
            ('foo', 'foo', 2),
            ('foo', 'foo', 3),
            ('foo', 'foo', 4)],
           names=['label', 'label', 'x'])

Problem description

The "label" field is duplicated in the index of the result

Expected Output

I expect the index after the apply to be the same as before, ie to only contain "label" once

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.1
Cython : 0.29.15
pytest : 5.4.0
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupbyNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions