Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas as pd
index = pd.period_range(start="2021-01-01", end="2021-01-3", freq="T")
n = index.shape[0]
df = pd.DataFrame({"A": range(n), "B": range(n)}, index=index)
>>> df.head()
# A B
# 2021-01-01 00:00 0 0
# 2021-01-01 00:01 1 1
# 2021-01-01 00:02 2 2
# 2021-01-01 00:03 3 3
# 2021-01-01 00:04 4 4
def f(grp: pd.DataFrame) -> int:
return 2
>>> df.resample("D").agg(lambda x: f(x))
# A B
# 2021-01-01 2 2
# 2021-01-02 2 2
# 2021-01-03 2 2
def g(grp: pd.DataFrame) -> int:
# the addition of this line alters the result
x = grp["A"].sum()
return 2
>>> df.resample("D").agg(lambda x: g(x))
# 2021-01-01 2
# 2021-01-02 2
# 2021-01-03 2
# Freq: D, dtype: int64
Issue Description
The code should be fairly self-explanatory, but the idea is that I've defined two functions, f
and g
. Both take in a DataFrame and return the number 2, but g
first sums the DataFrame's A
column and discards the result. This action unexplainably alters the output of df.resample("D").agg
. When we don't sum the A
column, we get back a full DataFrame with both A
and B
columns. When we do sum the A
column, we get back a Series.
Expected Behavior
The output of df.resample("D").agg(lambda x: f(x))
and df.resample("D").agg(lambda x: g(x))
should be exactly the same, since both functions return the same thing,