Skip to content

BUG: Unexpected side effects within agg function #44813

Open
@matteosantama

Description

@matteosantama

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

index = pd.period_range(start="2021-01-01", end="2021-01-3", freq="T")
n = index.shape[0]

df = pd.DataFrame({"A": range(n), "B": range(n)}, index=index)

>>> df.head()
#                   A  B
# 2021-01-01 00:00  0  0
# 2021-01-01 00:01  1  1
# 2021-01-01 00:02  2  2
# 2021-01-01 00:03  3  3
# 2021-01-01 00:04  4  4


def f(grp: pd.DataFrame) -> int:
    return 2

>>> df.resample("D").agg(lambda x: f(x))
#             A  B
# 2021-01-01  2  2
# 2021-01-02  2  2
# 2021-01-03  2  2


def g(grp: pd.DataFrame) -> int:
    # the addition of this line alters the result
    x = grp["A"].sum()
    return 2

>>> df.resample("D").agg(lambda x: g(x))
# 2021-01-01    2
# 2021-01-02    2
# 2021-01-03    2
# Freq: D, dtype: int64

Issue Description

The code should be fairly self-explanatory, but the idea is that I've defined two functions, f and g. Both take in a DataFrame and return the number 2, but g first sums the DataFrame's A column and discards the result. This action unexplainably alters the output of df.resample("D").agg. When we don't sum the A column, we get back a full DataFrame with both A and B columns. When we do sum the A column, we get back a Series.

Expected Behavior

The output of df.resample("D").agg(lambda x: f(x)) and df.resample("D").agg(lambda x: g(x)) should be exactly the same, since both functions return the same thing,

Installed Versions

INSTALLED VERSIONS ------------------ commit : 945c9ed python : 3.9.7.final.0 python-bits : 64 OS : Darwin OS-release : 21.1.0 Version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:01 PDT 2021; root:xnu-8019.41.5~1/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 1.3.4 numpy : 1.21.4 pytz : 2021.3 dateutil : 2.8.2 pip : 21.3.1 setuptools : 57.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 7.30.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.5.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 6.0.1 pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugResampleresample method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions