Skip to content

BUG: groupby sub-selection ignored with some methods #5264

Closed
@TomAugspurger

Description

@TomAugspurger

related #6512, #6524, #6346

Update from @hayd I think these should reference _selected_obj rather than obj.

Looking through some others, looks these also ignore the selection

Aggregation functions like (they already kind of do, but they allow bad selections ie column names not in columns, may be sep issue?):

  • sum/max/min/median/mean/var/std/.. (not tested)
  • agg (not tested)
    (these "work" with the described bug)

Atm selecting a column not in df doesn't raise:

what about iloc/loc/ix (current all disabled)?

  • iloc (very similar to head/tail)
  • loc/ix (maybe push off for now, this is pretty tricky)
  • iterate over all (whitelisted) functions to check they adhere to this

The column selection on a groupby object is being ignored when .quantile() is called. So it computes the quantile on all the (numeric) columns and returns the full DataFrame.

In [92]: t = pd.DataFrame(np.random.randn(10, 4)); t[0] = np.hstack([np.ones(5), np.zeros(5)])

In [95]: t.groupby(0)[[1, 2]].quantile()  # shows other cols
Out[95]: 
   0         1         2         3
0                                 
0  0  0.127152  0.108908  0.369601
1  1 -0.321279  0.265550 -0.382398

In [96]: t[[1, 2]].groupby(t[0]).quantile()  # Should be equivalent to:
Out[96]: 
          1         2
0                    
0  0.127152  0.108908
1 -0.321279  0.265550

Seeing all these, I'm wondering if this is a bug or just how some of the methods are implementer. The docs don't mention anything about only supporting some methods though.

version: '0.12.0-883-g988d4be'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions