Skip to content

Inconsistent handling of index after groupby operation #15272

Closed
@pirsquared

Description

@pirsquared

from this SO question

snippet 1


df = pd.DataFrame(dict(A=[0, 1, 2, 3]))

# returns results identical to df.A
print(df.groupby(df.A // 2).A.nsmallest(2))

# returns results out of order
print(df.groupby(df.A // 2).A.nlargest(2))

0    0
1    1
2    2
3    3
Name: A, dtype: int64
A   
0  1    1
   0    0
1  3    3
   2    2
Name: A, dtype: int64

snippet 2


df = pd.DataFrame(dict(A=[0, 1, 2, 3]))

print(df.groupby(df.A // 2).A.apply(pd.Series.sample, n=2))

Problem description

When the results of a groupby operation return the same results as what was in a the group in the first place, the index is left identical to the object being grouped. This doesn't sound so horrible until you realize that it is inconsistent with very comparable operations. This is observed in snippet 1. However, snippet 2 puts a finer point on it. The same code sample produces randomly different results.

Expected Output

A   
0  1    0
   0    1
1  3    2
   2    3
Name: A, dtype: int64
A   
0  1    1
   0    0
1  3    3
   2    2
Name: A, dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: 0.2.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions