Description
- I have checked that this issue has not already been reported.
- I can't find anything in the bug tracker that matches the symptoms I'm reporting here, although it's a bit difficult to search (I'm not totally sure how to describe it)
- I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
import pandas as pd
data = [{"label": l, "x" : x, "y": x + 1} for l in ("foo", "bar") for x in range(5)]
df = pd.DataFrame(data)
df = df.set_index(["label", "x"])
series = df["y"]
series2 = series.groupby(["label"]).apply(lambda s: s[2:])
print(series2.index)
# Output:
MultiIndex([('bar', 'bar', 2),
('bar', 'bar', 3),
('bar', 'bar', 4),
('foo', 'foo', 2),
('foo', 'foo', 3),
('foo', 'foo', 4)],
names=['label', 'label', 'x'])
Problem description
The "label" field is duplicated in the index of the result
Expected Output
I expect the index after the apply to be the same as before, ie to only contain "label" once
Output of pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.8.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.1
Cython : 0.29.15
pytest : 5.4.0
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
numba : None