Skip to content

It's difficult to predict what DataFrame.groupby().apply() will return: #9867

Closed
@ruoyu0088

Description

@ruoyu0088

I found It's difficult to predict what DataFrame.groupby().apply() will return. the result depends on the type of the return object and the index of the return object. for example:

import pandas as pd
df = pd.DataFrame({"a":[1, 2, 1, 2], "b":[1, 2, 3, 4], "c":[5, 6, 7, 8]})

When the argument and the return object is DataFrame and has the same index object, there are not group keys in the result:

print df.groupby("a").apply(lambda x:x)

the output is:

   a  b  c
0  1  1  5
1  2  2  6
2  1  3  7
3  2  4  8

if the index is not the same object, there are group keys, even the index values are the same:

print df.groupby("a").apply(lambda x:x[:])

the output is:

     a  b  c
a           
1 0  1  1  5
  2  1  3  7
2 1  2  2  6
  3  2  4  8

if the function returns Series object and the index of these Series objects are not he same values, the index of the result is a MultiIndex:

print df.groupby("a").apply(lambda x:x.b + x.c)

the output:

a   
1  0     6
   2    10
2  1     8
   3    12
dtype: int64

If all the Series objects have the same index values, the Series objects are the rows of the result:

print df.groupby("a").apply(lambda x:(x.b + x.c).reset_index(drop=True))

the output:

   0   1
a       
1  6  10
2  8  12

Here are more exampes:

Because the index is the same object:

print df.groupby("a").apply(lambda x:(x.b + x.c).to_frame())

not group keys in the output:

    0
0   6
1   8
2  10
3  12

If we copy the return value, the index is not the same object:

print df.groupby("a").apply(lambda x:(x.b + x.c).to_frame()[:])

the output contains group keys:

      0
a      
1 0   6
  2  10
2 1   8
  3  12
print df.groupby("a").apply(lambda x:x[["b", "c"]])

no group keys because the index object is the same (but use x[:] will get the group keys):

   b  c
0  1  5
1  2  6
2  3  7
3  4  8

It seems that there is no document about this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions