Skip to content

BUG: groupby.min has a side effect on groupby.apply #34656

Closed
@gshimansky

Description

@gshimansky
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame(
    {
        "col1": [0, 1, 2, 3],
        "col4": [17, 13, 16, 15],
        "col5": [-4, -5, -6, -7],
    }
)
by=["col4", "col5"]
apply_function = min

gb = df.groupby(by, as_index=True)

df1 = gb.apply(apply_function)
print(df1)

df2 = gb.min()
print(df2)

df3 = gb.apply(apply_function)
print(df3)

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

In the code above two calls to gb.apply(apply_function) produce different output. The reason for this is that groupby.min is called before 2nd apply and makes its output different and incorrect.

Expected Output

Expected that both calls to gb.apply(apply_function) produce the same output.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-26-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.4
numpy : 1.18.4
pytz : 2019.2
dateutil : 2.7.3
pip : 20.1.1
setuptools : 47.1.0
Cython : 0.29.17
pytest : 5.4.2
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.5.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.1
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : 0.13.2
pyarrow : 0.16.0
pytables : None
pytest : 5.4.2
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.46.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions