Description
I found It's difficult to predict what DataFrame.groupby().apply()
will return. the result depends on the type of the return object and the index of the return object. for example:
import pandas as pd
df = pd.DataFrame({"a":[1, 2, 1, 2], "b":[1, 2, 3, 4], "c":[5, 6, 7, 8]})
When the argument and the return object is DataFrame
and has the same index object, there are not group keys in the result:
print df.groupby("a").apply(lambda x:x)
the output is:
a b c
0 1 1 5
1 2 2 6
2 1 3 7
3 2 4 8
if the index is not the same object, there are group keys, even the index values are the same:
print df.groupby("a").apply(lambda x:x[:])
the output is:
a b c
a
1 0 1 1 5
2 1 3 7
2 1 2 2 6
3 2 4 8
if the function returns Series
object and the index of these Series
objects are not he same values, the index of the result is a MultiIndex
:
print df.groupby("a").apply(lambda x:x.b + x.c)
the output:
a
1 0 6
2 10
2 1 8
3 12
dtype: int64
If all the Series
objects have the same index values, the Series
objects are the rows of the result:
print df.groupby("a").apply(lambda x:(x.b + x.c).reset_index(drop=True))
the output:
0 1
a
1 6 10
2 8 12
Here are more exampes:
Because the index is the same object:
print df.groupby("a").apply(lambda x:(x.b + x.c).to_frame())
not group keys in the output:
0
0 6
1 8
2 10
3 12
If we copy the return value, the index is not the same object:
print df.groupby("a").apply(lambda x:(x.b + x.c).to_frame()[:])
the output contains group keys:
0
a
1 0 6
2 10
2 1 8
3 12
print df.groupby("a").apply(lambda x:x[["b", "c"]])
no group keys because the index object is the same (but use x[:]
will get the group keys):
b c
0 1 5
1 2 6
2 3 7
3 4 8
It seems that there is no document about this.