Skip to content

groupby filter can't access all columns #6512

Closed
@hayd

Description

@hayd

Unexpectedly? don't have access to the grouped columns.

Also seems to allow returning a boolean Series (and takes the first item as the condition).

In [11]: df = pd.DataFrame([[1, 2], [1, 3], [5, 6]], columns=['A', 'B'])

In [12]: g = df.groupby('A')  # same with as_index=False (which *correctly* has no effect)

In [13]: g.filter(lambda x: x['A'].sum() == 2)
KeyError: u'no item named A'

In [14]: g.filter(lambda x: x['B'].sum() == 5)  # works
Out[14]:
   A  B
0  1  2
1  1  3

named A'

In [14]: g.filter(lambda x: x['B'].sum() == 5)  # works
Out[14]:
   A  B
0  1  2
1  1  3

In [15]: g.filter(lambda x: x.sum() == 5)  # weird that this works (excepted raise)
Out[15]:
   A  B
0  1  2
1  1  3

In [16]: g = df.groupby(df['A'])  # hack/workaround

In [16]: g.filter(lambda x: x.sum() == 5)  # seems to look at first col
Out[16]:
   A  B
2  5  6

In [17]: g.filter(lambda x: x['A'].sum() == 5)  # works
Out[17]:
   A  B
2  5  6

In [18]: g.filter(lambda x: x['B'].sum() == 5)  # works
Out[18]:
   A  B
0  1  2
1  1  3

cc @danielballan

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions