Closed
Description
Update from @hayd I think these should reference _selected_obj
rather than obj
.
Looking through some others, looks these also ignore the selection
- count FIX use selected_obj rather the obj throughout groupby #6570
- corr FIX use selected_obj rather the obj throughout groupby #6570
- cummax FIX use selected_obj rather the obj throughout groupby #6570
- cummin FIX use selected_obj rather the obj throughout groupby #6570
- cumsum FIX use selected_obj rather the obj throughout groupby #6570
- cumprod FIX use selected_obj rather the obj throughout groupby #6570
- describe FIX use selected_obj rather the obj throughout groupby #6570
- fillna FIX use selected_obj rather the obj throughout groupby #6570
- quantile FIX use selected_obj rather the obj throughout groupby #6570
- head API change in groupby head and tail #6533
- hist? the output is ok but the plots have all
- ohlc? possibly fixed with FIX use selected_obj rather the obj throughout groupby #6570 (resample with ohlc is tested), should this method exist? see ohlc not available for groupby/etc #6594
- plot
- rank FIX use selected_obj rather the obj throughout groupby #6570
- tail API change in groupby head and tail #6533
- filter, FIX use selected_obj rather the obj throughout groupby #6570 (tested in FIX filter selects selected columns #6593)
- resample, FIX use selected_obj rather the obj throughout groupby #6570
- nth ENH/BUG groupby nth now filters, works with DataFrames #6569
- diff/shift, FIX use selected_obj rather the obj throughout groupby #6570
- all/any FIX use selected_obj rather the obj throughout groupby #6570
- ffill, FIX use selected_obj rather the obj throughout groupby #6570
- pct_change FIX use selected_obj rather the obj throughout groupby #6570
- idxmin/idxmax, FIX use selected_obj rather the obj throughout groupby #6570
- dtypes FIX use selected_obj rather the obj throughout groupby #6570
- apply FIX use selected_obj rather the obj throughout groupby #6570 (could be tested more / different paths?)
Aggregation functions like (they already kind of do, but they allow bad selections ie column names not in columns, may be sep issue?):
- sum/max/min/median/mean/var/std/.. (not tested)
- agg (not tested)
(these "work" with the described bug)
Atm selecting a column not in df doesn't raise:
- it should raise a Key Error, FIX raise when groupby selecting cols not in frame #6578
what about iloc/loc/ix
(current all disabled)?
- iloc (very similar to head/tail)
- loc/ix (maybe push off for now, this is pretty tricky)
- iterate over all (whitelisted) functions to check they adhere to this
The column selection on a groupby object is being ignored when .quantile()
is called. So it computes the quantile on all the (numeric) columns and returns the full DataFrame.
In [92]: t = pd.DataFrame(np.random.randn(10, 4)); t[0] = np.hstack([np.ones(5), np.zeros(5)])
In [95]: t.groupby(0)[[1, 2]].quantile() # shows other cols
Out[95]:
0 1 2 3
0
0 0 0.127152 0.108908 0.369601
1 1 -0.321279 0.265550 -0.382398
In [96]: t[[1, 2]].groupby(t[0]).quantile() # Should be equivalent to:
Out[96]:
1 2
0
0 0.127152 0.108908
1 -0.321279 0.265550
Seeing all these, I'm wondering if this is a bug or just how some of the methods are implementer. The docs don't mention anything about only supporting some methods though.
version: '0.12.0-883-g988d4be'