Closed
Description
Is your feature request related to a problem?
Pandas provides DataFrameGroupBy.head() and tail(), which efficiently slice the beginning and end of each group while preserving the order and index. I would like to be able to do a general row slice with the same properties. DataFrame has head(), tail() and iloc that behave in a compatible way. There is no corresponding DataFrameGroupBy.iloc.
Describe the solution you'd like
Provide a new DataFrameGroupBy method to slice rows per group
API breaking implications
None
Describe alternatives you've considered
The following are existing ways to extract, say, the second and third entry of each group, assuming that there are a large number of rows in each group (~10000):
- grouped.apply(lambda x: x.iloc[1:3, :]) - Extremely slow. Does not preserve the order or indexing.
- grouped.take([1, 2]) - Extremely slow. Does not preserve the order or indexing.
- grouped.nth([1, 2]) - Quite fast for a small list. Does not preserve the order or indexing.
- grouped.head(3).groupby('...').tail(2) - Quite fast. Does preserve index and ordering.
- grouped._selected_obj[mask] where mask is built from grouped.cumcount() - Very fast. Does preserve index and ordering. But uses private attribute of DataFrameGroupBy and takes several lines of code.
Additional context
There are three options:
- Add an option to an existing method to force it to preserve index and order. But take() is very slow and nth() is quite slow. Neither accept a slice argument, so a range list has to be provided.
- Easiest: Add a new method taking a slice as an argument and implementing it as in 5 above.
- Most logical and complete: Add a new iloc attribute analogous to DataFrame.iloc