Skip to content

Using groupby with nlargest drops the groupby column if there is only one element per group #29129

Closed
@Khris777

Description

@Khris777

Apologies if this has been fixed in 0.25.2, it's not in the repository yet.

Code Sample

import pandas as pd

df = pd.DataFrame({'C1':['abc','def'],'C2':[2,1]})
print(df.groupby("C1").C2.nlargest(1).reset_index())

df = pd.DataFrame({'C1':['abc','abc','def','def'],'C2':[2,3,2,1]})
print(df.groupby("C1").C2.nlargest(1).reset_index())

Problem description

Result of the first part, the C1 column is gone and there is an index column:

   index  C2
0      0   2
1      1   1

Result of the second part, the C1 column is there and there is a level_1 column:

    C1  level_1  C2
0  abc        1   3
1  def        2   2

Expected Output

I'm expecting the first output to have the same format as the second, that is the groupby not dropping its groupby-column.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.79-1-MANJARO
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : de_DE.utf8
LOCALE : de_DE.UTF-8

pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.6.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions