From bcdbf7323f66243cae655dd6995dd4fd14eac8b4 Mon Sep 17 00:00:00 2001
From: Richard Shadrach <rhshadrach@gmail.com>
Date: Fri, 24 Feb 2023 18:10:48 -0500
Subject: [PATCH 1/5] DOC: Overhaul groupby.rst in the User Guide

---
 doc/source/user_guide/groupby.rst | 572 ++++++++++++++++++------------
 1 file changed, 345 insertions(+), 227 deletions(-)

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index 2fdd36d861e15..d7e37a30e1cc8 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -36,9 +36,22 @@ following:
     * Discard data that belongs to groups with only a few members.
     * Filter out data based on the group sum or mean.
 
-* Some combination of the above: GroupBy will examine the results of the apply
-  step and try to return a sensibly combined result if it doesn't fit into
-  either of the above two categories.
+Many of these operations are defined on GroupBy objects. These operations are similar
+to the :ref:`aggregating API <basics.aggregate>`, :ref:`window API <window.overview>`,
+and :ref:`resample API <timeseries.aggregate>`.
+
+It is possible that a given operation does not fall into one of these categories or
+is some combination of them. In such a case, it may be possible to compute the
+operation using GroupBy's ``apply`` method. This method will examine the results of the
+apply step and try to return a sensibly combined result if it doesn't fit into either
+of the above two categories.
+
+.. note::
+
+   An operation that is split into multiple steps using built-in GroupBy operations
+   will be more efficient than using the ``apply`` method with a user-defined Python
+   function.
+
 
 Since the set of object instance methods on pandas data structures are generally
 rich and expressive, we often simply want to invoke, say, a DataFrame function
@@ -68,7 +81,7 @@ object (more on what the GroupBy object is later), you may do the following:
 
 .. ipython:: python
 
-    df = pd.DataFrame(
+    speeds = pd.DataFrame(
         [
             ("bird", "Falconiformes", 389.0),
             ("bird", "Psittaciformes", 24.0),
@@ -79,12 +92,12 @@ object (more on what the GroupBy object is later), you may do the following:
         index=["falcon", "parrot", "lion", "monkey", "leopard"],
         columns=("class", "order", "max_speed"),
     )
-    df
+    speeds
 
     # default is axis=0
-    grouped = df.groupby("class")
-    grouped = df.groupby("order", axis="columns")
-    grouped = df.groupby(["class", "order"])
+    grouped = speeds.groupby("class")
+    grouped = speeds.groupby("order", axis="columns")
+    grouped = speeds.groupby(["class", "order"])
 
 The mapping can be specified many different ways:
 
@@ -465,41 +478,71 @@ Or for an object grouped on multiple columns:
 Aggregation
 -----------
 
-Once the GroupBy object has been created, several methods are available to
-perform a computation on the grouped data. These operations are similar to the
-:ref:`aggregating API <basics.aggregate>`, :ref:`window API <window.overview>`,
-and :ref:`resample API <timeseries.aggregate>`.
-
-An obvious one is aggregation via the
-:meth:`~pandas.core.groupby.DataFrameGroupBy.aggregate` or equivalently
-:meth:`~pandas.core.groupby.DataFrameGroupBy.agg` method:
+An aggregation is a GroupBy operation that reduces the dimension of the grouping
+object. The result of an aggregation is, or at least treated as,
+a scalar value for each column in a group. For example, producing a sum of each
+column in group of values.
 
 .. ipython:: python
 
-   grouped = df.groupby("A")
-   grouped[["C", "D"]].aggregate(np.sum)
-
-   grouped = df.groupby(["A", "B"])
-   grouped.aggregate(np.sum)
+   animals = pd.DataFrame(
+       {
+           "kind": ["cat", "dog", "cat", "dog"],
+           "height": [9.1, 6.0, 9.5, 34.0],
+           "weight": [7.9, 7.5, 9.9, 198.0],
+       }
+   )
+   animals
+   animals.groupby("kind").sum()
 
-As you can see, the result of the aggregation will have the group names as the
-new index along the grouped axis. In the case of multiple keys, the result is a
-:ref:`MultiIndex <advanced.hierarchical>` by default, though this can be
-changed by using the ``as_index`` option:
+In the result, the keys of the groups appear in the index by default. They can be
+instead included in the columns by passing ``as_index=False``.
 
 .. ipython:: python
 
-   grouped = df.groupby(["A", "B"], as_index=False)
-   grouped.aggregate(np.sum)
+   animals.groupby("kind", as_index=False).sum()
 
-   df.groupby("A", as_index=False)[["C", "D"]].sum()
+.. _groupby.aggregate.builtin:
 
-Note that you could use the ``reset_index`` DataFrame function to achieve the
-same result as the column names are stored in the resulting ``MultiIndex``:
+Built-in aggregation methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. ipython:: python
+Many common aggregations are built-in to GroupBy objects as methods. Of the methods
+listed below, those with a ``*`` do _not_ have a Cython-optimized implementation.
+
+.. csv-table::
+    :header: "Method", "Description"
+    :widths: 20, 80
+    :delim: ;
 
-   df.groupby(["A", "B"]).sum().reset_index()
+        :meth:`~.DataFrameGroupBy.any`;Compute whether any of the values in the groups are truthy
+        :meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
+        :meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
+        :meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
+        :meth:`~.DataFrameGroupBy.first` *;Compute the first occurring value in each group
+        :meth:`~.DataFrameGroupBy.idxmax` *;Compute the index of the maximum value in each group
+        :meth:`~.DataFrameGroupBy.idxmin` *;Compute the index of the minimum value in each group
+        :meth:`~.DataFrameGroupBy.last` *;Compute the last occurring value in each group
+        :meth:`~.DataFrameGroupBy.max` *;Compute the maximum value in each group
+        :meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
+        :meth:`~.DataFrameGroupBy.median`;Compute the median of each group
+        :meth:`~.DataFrameGroupBy.min` *;Compute the minimum value in each group
+        :meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
+        :meth:`~.DataFrameGroupBy.prod` *;Compute the product of the values in each group
+        :meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
+        :meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
+        :meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
+        :meth:`~.DataFrameGroupBy.skew` *;Compute the skew of the values in each group
+        :meth:`~.DataFrameGroupBy.std`;Compute the standard deviation of the values in each group
+        :meth:`~.DataFrameGroupBy.sum`;Compute the sum of the values in each group
+        :meth:`~.DataFrameGroupBy.var`;Compute the variance of the values in each group
+
+Some examples:
+
+.. ipython:: python
+
+   df.groupby("A")[["C", "D"]].max()
+   df.groupby(["A", "B"]).mean()
 
 Another simple aggregation example is to compute the size of each group.
 This is included in GroupBy as the ``size`` method. It returns a Series whose
@@ -507,6 +550,7 @@ index are the group names and whose values are the sizes of each group.
 
 .. ipython:: python
 
+   grouped = df.groupby(["A", "B"])
    grouped.size()
 
 .. ipython:: python
@@ -531,34 +575,76 @@ Another aggregation example is to compute the number of unique values of each gr
    Passing ``as_index=False`` **will** return the groups that you are aggregating over, if they are
    named *columns*.
 
-Aggregating functions are the ones that reduce the dimension of the returned objects.
-Some common aggregating functions are tabulated below:
 
-.. csv-table::
-    :header: "Function", "Description"
-    :widths: 20, 80
-    :delim: ;
+.. _groupby.aggregate.agg:
+
+The :meth:`~.DataFrameGroupBy.aggregate` method
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The :meth:`~.DataFrameGroupBy.aggregate` method can accept many different types of
+inputs. This section details using string aliases for various GroupBy methods; other
+inputs are detailed in the sections below.
+
+.. ipython:: python
+
+   grouped = df.groupby("A")
+   grouped[["C", "D"]].aggregate("sum")
+
+   grouped = df.groupby(["A", "B"])
+   grouped.agg("sum")
+
+As you can see, the result of the aggregation will have the group names as the
+new index along the grouped axis. In the case of multiple keys, the result is a
+:ref:`MultiIndex <advanced.hierarchical>` by default. As mentioned above, this can be
+changed by using the ``as_index`` option:
+
+.. ipython:: python
+
+   grouped = df.groupby(["A", "B"], as_index=False)
+   grouped.aggregate("sum")
+
+   df.groupby("A", as_index=False)[["C", "D"]].sum()
+
+Note that you could use the ``reset_index`` DataFrame function to achieve the
+same result as the column names are stored in the resulting ``MultiIndex``:
 
-        :meth:`~pd.core.groupby.DataFrameGroupBy.mean`;Compute mean of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.sum`;Compute sum of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.size`;Compute group sizes
-        :meth:`~pd.core.groupby.DataFrameGroupBy.count`;Compute count of group
-        :meth:`~pd.core.groupby.DataFrameGroupBy.std`;Standard deviation of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.var`;Compute variance of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.sem`;Standard error of the mean of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.describe`;Generates descriptive statistics
-        :meth:`~pd.core.groupby.DataFrameGroupBy.first`;Compute first of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.last`;Compute last of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.nth`;Take nth value, or a subset if n is a list
-        :meth:`~pd.core.groupby.DataFrameGroupBy.min`;Compute min of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.max`;Compute max of group values
+.. ipython:: python
 
+   df.groupby(["A", "B"]).agg("sum").reset_index()
 
 The aggregating functions above will exclude NA values. Any function which
 reduces a :class:`Series` to a scalar value is an aggregation function and will work,
-a trivial example is ``df.groupby('A').agg(lambda ser: 1)``. Note that
-:meth:`~pd.core.groupby.DataFrameGroupBy.nth` can act as a reducer *or* a
-filter, see :ref:`here <groupby.nth>`.
+a trivial example is ``df.groupby('A').agg(lambda ser: 1)``.
+
+.. _groupby.aggregate.udf:
+
+Aggregation with User-Defined Functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Users can also provide their own User-Defined Functions (UDFs) for custom aggregations.
+
+.. warning::
+
+    When aggregating with a UDF, the UDF should not mutate the
+    provided ``Series``, see :ref:`gotchas.udf-mutation` for more information.
+
+.. note::
+
+    Aggregating with a UDF is often less performant than using
+    the pandas built-in methods on GroupBy. Consider breaking up a complex operation
+    into a chain of operations that utilize the built-in methods.
+
+.. ipython:: python
+
+   animals
+   animals.groupby("kind")[["height"]].agg(lambda x: set(x))
+
+The resulting dtype will reflect that of the aggregating function. If the results from different groups have
+different dtypes, then a common dtype will be determined in the same way as ``DataFrame`` construction.
+
+.. ipython:: python
+
+   animals.groupby("kind")[["height"]].agg(lambda x: x.astype(int).sum())
 
 .. _groupby.aggregate.multifunc:
 
@@ -571,14 +657,14 @@ aggregation with, outputting a DataFrame:
 .. ipython:: python
 
    grouped = df.groupby("A")
-   grouped["C"].agg([np.sum, np.mean, np.std])
+   grouped["C"].agg(["sum", "mean", "std"])
 
 On a grouped ``DataFrame``, you can pass a list of functions to apply to each
 column, which produces an aggregated result with a hierarchical index:
 
 .. ipython:: python
 
-   grouped[["C", "D"]].agg([np.sum, np.mean, np.std])
+   grouped[["C", "D"]].agg(["sum", "mean", "std"])
 
 
 The resulting aggregations are named for the functions themselves. If you
@@ -588,7 +674,7 @@ need to rename, then you can add in a chained operation for a ``Series`` like th
 
    (
        grouped["C"]
-       .agg([np.sum, np.mean, np.std])
+       .agg(["sum", "mean", "std"])
        .rename(columns={"sum": "foo", "mean": "bar", "std": "baz"})
    )
 
@@ -597,24 +683,23 @@ For a grouped ``DataFrame``, you can rename in a similar manner:
 .. ipython:: python
 
    (
-       grouped[["C", "D"]].agg([np.sum, np.mean, np.std]).rename(
+       grouped[["C", "D"]].agg(["sum", "mean", "std"]).rename(
            columns={"sum": "foo", "mean": "bar", "std": "baz"}
        )
    )
 
 .. note::
 
-   In general, the output column names should be unique. You can't apply
-   the same function (or two functions with the same name) to the same
+   In general, the output column names should be unique, but pandas will allow
+   you apply to the same function (or two functions with the same name) to the same
    column.
 
    .. ipython:: python
-      :okexcept:
 
       grouped["C"].agg(["sum", "sum"])
 
 
-   pandas *does* allow you to provide multiple lambdas. In this case, pandas
+   pandas also allows you to provide multiple lambdas. In this case, pandas
    will mangle the name of the (nameless) lambda functions, appending ``_<i>``
    to each subsequent lambda.
 
@@ -623,14 +708,13 @@ For a grouped ``DataFrame``, you can rename in a similar manner:
       grouped["C"].agg([lambda x: x.max() - x.min(), lambda x: x.median() - x.mean()])
 
 
-
 .. _groupby.aggregate.named:
 
 Named aggregation
 ~~~~~~~~~~~~~~~~~
 
 To support column-specific aggregation *with control over the output column names*, pandas
-accepts the special syntax in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg`, known as "named aggregation", where
+accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg`, known as "named aggregation", where
 
 - The keywords are the *output* column names
 - The values are tuples whose first element is the column to select
@@ -641,19 +725,12 @@ accepts the special syntax in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGrou
 
 .. ipython:: python
 
-   animals = pd.DataFrame(
-       {
-           "kind": ["cat", "dog", "cat", "dog"],
-           "height": [9.1, 6.0, 9.5, 34.0],
-           "weight": [7.9, 7.5, 9.9, 198.0],
-       }
-   )
    animals
 
    animals.groupby("kind").agg(
        min_height=pd.NamedAgg(column="height", aggfunc="min"),
        max_height=pd.NamedAgg(column="height", aggfunc="max"),
-       average_weight=pd.NamedAgg(column="weight", aggfunc=np.mean),
+       average_weight=pd.NamedAgg(column="weight", aggfunc="mean"),
    )
 
 
@@ -664,7 +741,7 @@ accepts the special syntax in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGrou
    animals.groupby("kind").agg(
        min_height=("height", "min"),
        max_height=("height", "max"),
-       average_weight=("weight", np.mean),
+       average_weight=("weight", "mean"),
    )
 
 
@@ -675,21 +752,15 @@ and unpack the keyword arguments
 
    animals.groupby("kind").agg(
        **{
-           "total weight": pd.NamedAgg(column="weight", aggfunc=sum)
+           "total weight": pd.NamedAgg(column="weight", aggfunc="sum")
        }
    )
 
-Additional keyword arguments are not passed through to the aggregation functions. Only pairs
+When using named aggregation, additional keyword arguments are not passed through
+to the aggregation functions; only pairs
 of ``(column, aggfunc)`` should be passed as ``**kwargs``. If your aggregation functions
 requires additional arguments, partially apply them with :meth:`functools.partial`.
 
-.. note::
-
-   For Python 3.5 and earlier, the order of ``**kwargs`` in a functions was not
-   preserved. This means that the output column ordering would not be
-   consistent. To ensure consistent ordering, the keys (and so output columns)
-   will always be sorted for Python 3.5.
-
 Named aggregation is also valid for Series groupby aggregations. In this case there's
 no column selection, so the values are just the functions.
 
@@ -708,59 +779,97 @@ columns of a DataFrame:
 
 .. ipython:: python
 
-   grouped.agg({"C": np.sum, "D": lambda x: np.std(x, ddof=1)})
+   grouped.agg({"C": "sum", "D": lambda x: np.std(x, ddof=1)})
 
 The function names can also be strings. In order for a string to be valid it
-must be either implemented on GroupBy or available via :ref:`dispatching
-<groupby.dispatch>`:
+must be either implemented on GroupBy:
 
 .. ipython:: python
 
    grouped.agg({"C": "sum", "D": "std"})
 
-.. _groupby.aggregate.cython:
+.. _groupby.transform:
 
-Cython-optimized aggregation functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Transformation
+--------------
 
-Some common aggregations, currently only ``sum``, ``mean``, ``std``, and ``sem``, have
-optimized Cython implementations:
+A transformation is a GroupBy operation whose result is indexed the same
+as the one being grouped. Common examples include ``cumsum`` and ``diff``.
 
 .. ipython:: python
 
-   df.groupby("A")[["C", "D"]].sum()
-   df.groupby(["A", "B"]).mean()
+    speeds
+    grouped = speeds.groupby("class")["max_speed"]
+    grouped.cumsum()
+    grouped.diff()
 
-Of course ``sum`` and ``mean`` are implemented on pandas objects, so the above
-code would work even without the special versions via dispatching (see below).
+Unlike aggregations, the groupings that are used to split
+the original object are not included in the result.
 
-.. _groupby.aggregate.udfs:
+.. note::
 
-Aggregations with User-Defined Functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Since transformations do not include the groupings that are used to split the result,
+    the arguments ``as_index`` and ``sort`` in :meth:`DataFrame.groupby` and
+    :meth:`Series.groupby` have no effect.
 
-Users can also provide their own functions for custom aggregations. When aggregating
-with a User-Defined Function (UDF), the UDF should not mutate the provided ``Series``, see
-:ref:`gotchas.udf-mutation` for more information.
+A common use of a transformation is to add the result back into the original DataFrame.
 
 .. ipython:: python
 
-   animals.groupby("kind")[["height"]].agg(lambda x: set(x))
+    result = speeds.copy()
+    result["cumsum"] = grouped.cumsum()
+    result["diff"] = grouped.diff()
+    result
 
-The resulting dtype will reflect that of the aggregating function. If the results from different groups have
-different dtypes, then a common dtype will be determined in the same way as ``DataFrame`` construction.
+Built-in transformation methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. ipython:: python
+The following methods on GroupBy act as transformations. Of these methods, only
+``fillna`` does not have a Cython-optimized implementation.
 
-   animals.groupby("kind")[["height"]].agg(lambda x: x.astype(int).sum())
+.. csv-table::
+    :header: "Method", "Description"
+    :widths: 20, 80
+    :delim: ;
 
-.. _groupby.transform:
+        :meth:`~.DataFrameGroupBy.bfill`;Back fill NA values within each group
+        :meth:`~.DataFrameGroupBy.cumcount`;Compute the cumulative count within each group
+        :meth:`~.DataFrameGroupBy.cummax`;Compute the cumulative max within each group
+        :meth:`~.DataFrameGroupBy.cummin`;Compute the cumulative min within each group
+        :meth:`~.DataFrameGroupBy.cumprod`;Compute the cumulative product within each group
+        :meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
+        :meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
+        :meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
+        :meth:`~.DataFrameGroupBy.fillna`;Fill NA values within each group
+        :meth:`~.DataFrameGroupBy.pct_change`;Compute the percent change between adjacent values within each group
+        :meth:`~.DataFrameGroupBy.rank`;Compute the rank of each value within each group
+        :meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group
 
-Transformation
---------------
+In addition, passing any built-in aggregation method as a string to
+:meth:`~.DataFrameGroupBy.transform` (see below) will broadcast the result across the group,
+producing a transformed result. If the aggregation method is Cython-optimized, this
+will be performant as well.
 
-The ``transform`` method returns an object that is indexed the same
-as the one being grouped. The transform function must:
+.. _groupby.transformation.transform:
+
+The :meth:`~.DataFrameGroupBy.transform` method
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Similar to the :ref:`aggregation method <groupby.aggregate.agg>`, the
+:meth:`~.DataFrameGroupBy.transform` can accept string aliases to the built-in
+transform methods in the previous section. It can *also* accept string aliases to the
+built-in aggregation methods. When an aggregation method is provided, the result will
+be broadcast across the group.
+
+.. ipython:: python
+
+    speeds
+    grouped = speeds.groupby("class")[["max_speed"]]
+    grouped.transform("cumsum")
+    grouped.transform("sum")
+
+In addition to string aliases, the :meth:`~.DataFrameGroupBy.transform` method can
+also except User-Defined functions (UDFs). The UDF must:
 
 * Return a result that is either the same size as the group chunk or
   broadcastable to the size of the group chunk (e.g., a scalar,
@@ -769,18 +878,29 @@ as the one being grouped. The transform function must:
   the first group chunk using chunk.apply.
 * Not perform in-place operations on the group chunk. Group chunks should
   be treated as immutable, and changes to a group chunk may produce unexpected
-  results.
-* (Optionally) operates on the entire group chunk. If this is supported, a
-  fast path is used starting from the *second* chunk.
+  results. See :ref:`gotchas.udf-mutation` for more information.
+* (Optionally) operates on all columns of the entire group chunk at once. If this is
+  supported, a fast path is used starting from the *second* chunk.
+
+.. note::
+
+    Transforming by supplying ``transform`` with a UDF is
+    often less performant than using the built-in methods on GroupBy.
+    Consider breaking up a complex operation into a chain of operations that utilize
+    the built-in methods.
+
+    All of the examples in this section can be made more performant by calling
+    built-in methods instead of using ``transform``.
+    See :ref:`below for examples <groupby_efficient_transforms>`.
 
 .. versionchanged:: 2.0.0
 
     When using ``.transform`` on a grouped DataFrame and the transformation function
     returns a DataFrame, pandas now aligns the result's index
-    with the input's index. You can call ``.to_numpy()`` on the
-    result of the transformation function to avoid alignment.
+    with the input's index. You can call ``.to_numpy()`` within the transformation
+    function to avoid alignment.
 
-Similar to :ref:`groupby.aggregate.udfs`, the resulting dtype will reflect that of the
+Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the
 transformation function. If the results from different groups have different dtypes, then
 a common dtype will be determined in the same way as ``DataFrame`` construction.
 
@@ -831,15 +951,6 @@ match the shape of the input array.
 
    ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
 
-Alternatively, the built-in methods could be used to produce the same outputs.
-
-.. ipython:: python
-
-   max_ts = ts.groupby(lambda x: x.year).transform("max")
-   min_ts = ts.groupby(lambda x: x.year).transform("min")
-
-   max_ts - min_ts
-
 Another common data transform is to replace missing data with the group mean.
 
 .. ipython:: python
@@ -880,18 +991,27 @@ and that the transformed data contains no NAs.
    grouped_trans.count()  # counts after transformation
    grouped_trans.size()  # Verify non-NA count equals group size
 
-.. note::
+.. _groupby_efficient_transforms:
 
-   Some functions will automatically transform the input when applied to a
-   GroupBy object, but returning an object of the same shape as the original.
-   Passing ``as_index=False`` will not affect these transformation methods.
+As mentioned in the note above, each of the examples in this section can be computed
+more efficiently using built-in methods.
 
-   For example: ``fillna, ffill, bfill, shift.``.
+.. ipython:: python
 
-   .. ipython:: python
+    # ts.groupby(lambda x: x.year).transform(
+    #     lambda x: (x - x.mean()) / x.std()
+    # )
+    grouped = ts.groupby(lambda x: x.year)
+    result = (ts - grouped.transform("mean")) / grouped.transform("std")
 
-      grouped.ffill()
+    # ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
+    grouped = ts.groupby(lambda x: x.year)
+    result = grouped.transform("max") - grouped.transform("min")
 
+    # grouped = data_df.groupby(key)
+    # grouped.transform(lambda x: x.fillna(x.mean()))
+    grouped = data_df.groupby(key)
+    result = data_df.fillna(grouped.transform("mean"))
 
 .. _groupby.transform.window_resample:
 
@@ -943,127 +1063,134 @@ missing values with the ``ffill()`` method.
 Filtration
 ----------
 
-The ``filter`` method returns a subset of the original object. Suppose we
-want to take only elements that belong to groups with a group sum greater
-than 2.
+A filtration is a GroupBy operation the subsets the original grouping object. It
+may either filter out entire groups, part of groups, or both. Filtrations return
+a filtered version of the calling object, including the grouping columns when provided.
+In the following example, ``class`` is included in the result.
 
 .. ipython:: python
 
-   sf = pd.Series([1, 1, 2, 3, 3, 3])
-   sf.groupby(sf).filter(lambda x: x.sum() > 2)
-
-The argument of ``filter`` must be a function that, applied to the group as a
-whole, returns ``True`` or ``False``.
+    speeds
+    speeds.groupby("class").nth(1)
 
-Another useful operation is filtering out elements that belong to groups
-with only a couple members.
+.. note::
 
-.. ipython:: python
+    Unlike aggregations, filtrations do not add the group keys to the index of the
+    result. Because of this, passing ``as_index=False`` will not affect these
+    transformation methods.
 
-   dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
-   dff.groupby("B").filter(lambda x: len(x) > 2)
-
-Alternatively, instead of dropping the offending groups, we can return a
-like-indexed objects where the groups that do not pass the filter are filled
-with NaNs.
+Filtrations will respect subsetting the columns of the GroupBy object.
 
 .. ipython:: python
 
-   dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)
+    speeds.groupby("class")[["order", "max_speed"]].nth(1)
 
-For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion.
+Built-in filtrations
+~~~~~~~~~~~~~~~~~~~~
 
-.. ipython:: python
+The following methods on GroupBy act as filtrations. All these methods have a
+Cython-optimized implementation.
 
-   dff["C"] = np.arange(8)
-   dff.groupby("B").filter(lambda x: len(x["C"]) > 2)
+.. csv-table::
+    :header: "Method", "Description"
+    :widths: 20, 80
+    :delim: ;
 
-.. note::
+        :meth:`~.DataFrameGroupBy.head`;Select the top row(s) of each group
+        :meth:`~.DataFrameGroupBy.nth`;Select the nth row(s) of each group
+        :meth:`~.DataFrameGroupBy.tail`;Select the bottom row(s) of each group
 
-   Some functions when applied to a groupby object will act as a **filter** on the input, returning
-   a reduced shape of the original (and potentially eliminating groups), but with the index unchanged.
-   Passing ``as_index=False`` will not affect these transformation methods.
+Users can also use transformations along with Boolean indexing to construct complex
+filtrations within groups. For example, suppose we are given groups of products and
+their volumes, and we wish to subset the data to only the largest products capturing no
+more than 90% of the total volume within each group.
 
-   For example: ``head, tail``.
+.. ipython:: python
 
-   .. ipython:: python
+    product_volumes = pd.DataFrame(
+        {
+            "group": list("xxxxyyy"),
+            "product": list("abcdefg"),
+            "volume": [10, 30, 20, 15, 40, 10, 20],
+        }
+    )
+    product_volumes
 
-      dff.groupby("B").head(2)
+    # Sort by volume to select the largest products first
+    product_volumes = product_volumes.sort_values("volume", ascending=False)
+    grouped = product_volumes.groupby("group")["volume"]
+    cumpct = grouped.cumsum() / grouped.transform("sum")
+    cumpct
+    significant_products = product_volumes[cumpct <= 0.9]
+    significant_products.sort_values(["group", "product"])
 
+The :class:`~DataFrameGroupBy.filter` method
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. _groupby.dispatch:
+.. note::
 
-Dispatching to instance methods
--------------------------------
+    Filtering by supplying ``filter`` with a User-Defined Function (UDF) is
+    often less performant than using the built-in methods on GroupBy.
+    Consider breaking up a complex operation into a chain of operations that utilize
+    the built-in methods.
+
+The ``filter`` method takes a User-Defined Function (UDF) that, when applied to
+an entire group, returns either ``True`` or ``False``. The result of the ``filter``
+method is then the subset of groups for which the UDF returned ``True``.
 
-When doing an aggregation or transformation, you might just want to call an
-instance method on each data group. This is pretty easy to do by passing lambda
-functions:
+Suppose we want to take only elements that belong to groups with a group sum greater
+than 2.
 
 .. ipython:: python
-   :okwarning:
 
-   grouped = df.groupby("A")[["C", "D"]]
-   grouped.agg(lambda x: x.std())
+   sf = pd.Series([1, 1, 2, 3, 3, 3])
+   sf.groupby(sf).filter(lambda x: x.sum() > 2)
 
-But, it's rather verbose and can be untidy if you need to pass additional
-arguments. Using a bit of metaprogramming cleverness, GroupBy now has the
-ability to "dispatch" method calls to the groups:
+Another useful operation is filtering out elements that belong to groups
+with only a couple members.
 
 .. ipython:: python
-   :okwarning:
 
-   grouped.std()
+   dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
+   dff.groupby("B").filter(lambda x: len(x) > 2)
 
-What is actually happening here is that a function wrapper is being
-generated. When invoked, it takes any passed arguments and invokes the function
-with any arguments on each group (in the above example, the ``std``
-function). The results are then combined together much in the style of ``agg``
-and ``transform`` (it actually uses ``apply`` to infer the gluing, documented
-next). This enables some operations to be carried out rather succinctly:
+Alternatively, instead of dropping the offending groups, we can return a
+like-indexed objects where the groups that do not pass the filter are filled
+with NaNs.
 
 .. ipython:: python
 
-   tsdf = pd.DataFrame(
-       np.random.randn(1000, 3),
-       index=pd.date_range("1/1/2000", periods=1000),
-       columns=["A", "B", "C"],
-   )
-   tsdf.iloc[::2] = np.nan
-   grouped = tsdf.groupby(lambda x: x.year)
-   grouped.fillna(method="pad")
-
-In this example, we chopped the collection of time series into yearly chunks
-then independently called :ref:`fillna <missing_data.fillna>` on the
-groups.
+   dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)
 
-The ``nlargest`` and ``nsmallest`` methods work on ``Series`` style groupbys:
+For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion.
 
 .. ipython:: python
 
-   s = pd.Series([9, 8, 7, 5, 19, 1, 4.2, 3.3])
-   g = pd.Series(list("abababab"))
-   gb = s.groupby(g)
-   gb.nlargest(3)
-   gb.nsmallest(3)
+   dff["C"] = np.arange(8)
+   dff.groupby("B").filter(lambda x: len(x["C"]) > 2)
 
 .. _groupby.apply:
 
 Flexible ``apply``
 ------------------
 
-Some operations on the grouped data might not fit into either the aggregate or
-transform categories. Or, you may simply want GroupBy to infer how to combine
-the results. For these, use the ``apply`` function, which can be substituted
-for both ``aggregate`` and ``transform`` in many standard use cases. However,
-``apply`` can handle some exceptional use cases.
+Some operations on the grouped data might not fit into the aggregation,
+transformation, or filtration categories. For these, you can use the ``apply``
+function.
+
+.. warning::
+
+   ``apply`` has to try to infer from the result whether it should act as a reducer,
+   transformer, *or* filter, depending on exactly what is passed to it. Thus the
+   grouped column(s) may be included in the output as well as set the indices. While
+   it tries to intelligently guess how to behave, it can sometimes guess wrong.
 
 .. note::
 
-   ``apply`` can act as a reducer, transformer, *or* filter function, depending
-   on exactly what is passed to it. It can depend on the passed function and
-   exactly what you are grouping. Thus the grouped column(s) may be included in
-   the output as well as set the indices.
+   All of these examples can be more reliably, and more efficiently, computed using
+   other pandas functionality. In fact, pandas maintainers are interested if you
+   have an operation that you must use ``apply`` for. If you believe you do, please
+   `raise an issue on GitHub <https://github.com/pandas-dev/pandas/issues/new/choose>`_
 
 .. ipython:: python
 
@@ -1098,10 +1225,14 @@ that is itself a series, and possibly upcast the result to a DataFrame:
     s
     s.apply(f)
 
+Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the
+apply function. If the results from different groups have different dtypes, then
+a common dtype will be determined in the same way as ``DataFrame`` construction.
+
 Control grouped column(s) placement with ``group_keys``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. note::
+.. versionchanged:: 1.5.0
 
    If ``group_keys=True`` is specified when calling :meth:`~DataFrame.groupby`,
    functions passed to ``apply`` that return like-indexed outputs will have the
@@ -1111,8 +1242,6 @@ Control grouped column(s) placement with ``group_keys``
    not be added for like-indexed outputs. In the future this behavior
    will change to always respect ``group_keys``, which defaults to ``True``.
 
-   .. versionchanged:: 1.5.0
-
 To control whether the grouped column(s) are included in the indices, you can use
 the argument ``group_keys``. Compare
 
@@ -1126,10 +1255,6 @@ with
 
     df.groupby("A", group_keys=False).apply(lambda x: x)
 
-Similar to :ref:`groupby.aggregate.udfs`, the resulting dtype will reflect that of the
-apply function. If the results from different groups have different dtypes, then
-a common dtype will be determined in the same way as ``DataFrame`` construction.
-
 
 Numba Accelerated Routines
 --------------------------
@@ -1153,8 +1278,8 @@ will be passed into ``values``, and the group index will be passed into ``index`
 Other useful features
 ---------------------
 
-Automatic exclusion of "nuisance" columns
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Exclusion of "nuisance" columns
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Again consider the example DataFrame we've been looking at:
 
@@ -1164,8 +1289,8 @@ Again consider the example DataFrame we've been looking at:
 
 Suppose we wish to compute the standard deviation grouped by the ``A``
 column. There is a slight problem, namely that we don't care about the data in
-column ``B``. We refer to this as a "nuisance" column. You can avoid nuisance
-columns by specifying ``numeric_only=True``:
+column ``B`` because it is not numeric. We refer to these non-numeric columns as
+"nuisance" columns. You can avoid nuisance columns by specifying ``numeric_only=True``:
 
 .. ipython:: python
 
@@ -1178,20 +1303,13 @@ is only interesting over one column (here ``colname``), it may be filtered
 
 .. note::
    Any object column, also if it contains numerical values such as ``Decimal``
-   objects, is considered as a "nuisance" columns. They are excluded from
+   objects, is considered as a "nuisance" column. They are excluded from
    aggregate functions automatically in groupby.
 
    If you do wish to include decimal or object columns in an aggregation with
    other non-nuisance data types, you must do so explicitly.
 
-.. warning::
-   The automatic dropping of nuisance columns has been deprecated and will be removed
-   in a future version of pandas. If columns are included that cannot be operated
-   on, pandas will instead raise an error. In order to avoid this, either select
-   the columns you wish to operate on or specify ``numeric_only=True``.
-
 .. ipython:: python
-    :okwarning:
 
     from decimal import Decimal
 

From fc158ee9de76c372ca3e42c71be8c403e00984a4 Mon Sep 17 00:00:00 2001
From: Richard Shadrach <rhshadrach@gmail.com>
Date: Mon, 27 Feb 2023 18:59:49 -0500
Subject: [PATCH 2/5] Improvements

---
 doc/source/user_guide/groupby.rst    | 56 ++++++++++++++++------------
 doc/source/user_guide/timeseries.rst |  2 +-
 doc/source/whatsnew/v0.7.0.rst       |  2 +-
 3 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index bc9802c0fa154..d35f3092ba1e5 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -508,7 +508,7 @@ Built-in aggregation methods
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Many common aggregations are built-in to GroupBy objects as methods. Of the methods
-listed below, those with a ``*`` do _not_ have a Cython-optimized implementation.
+listed below, those with a ``*`` do *not* have a Cython-optimized implementation.
 
 .. csv-table::
     :header: "Method", "Description"
@@ -553,11 +553,17 @@ index are the group names and whose values are the sizes of each group.
    grouped = df.groupby(["A", "B"])
    grouped.size()
 
+While the :meth:`~.DataFrameGroupBy.describe` method is not itself a reducer, it
+can be used to conveniently produce a collection of summary statistics about each of
+the groups.
+
 .. ipython:: python
 
    grouped.describe()
 
-Another aggregation example is to compute the number of unique values of each group. This is similar to the ``value_counts`` function, except that it only counts unique values.
+Another aggregation example is to compute the number of unique values of each group.
+This is similar to the ``value_counts`` function, except that it only counts the
+number of unique values.
 
 .. ipython:: python
 
@@ -568,12 +574,12 @@ Another aggregation example is to compute the number of unique values of each gr
 
 .. note::
 
-   Aggregation functions **will not** return the groups that you are aggregating over
+   Aggregation functions **will not** operate on the groups that you are aggregating over
    if they are named *columns*, when ``as_index=True``, the default. The grouped columns will
    be the **indices** of the returned object.
 
    Passing ``as_index=False`` **will** return the groups that you are aggregating over, if they are
-   named *columns*.
+   named **indices** or *columns*.
 
 
 .. _groupby.aggregate.agg:
@@ -581,9 +587,14 @@ Another aggregation example is to compute the number of unique values of each gr
 The :meth:`~.DataFrameGroupBy.aggregate` method
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The :meth:`~.DataFrameGroupBy.aggregate` method can accept many different types of
-inputs. This section details using string aliases for various GroupBy methods; other
-inputs are detailed in the sections below.
+.. note::
+    The :meth:`~.DataFrameGroupBy.aggregate` method can accept many different types of
+    inputs. This section details using string aliases for various GroupBy methods; other
+    inputs are detailed in the sections below.
+
+Any reduction method that pandas implements can be passed as a string to
+:meth:`~.DataFrameGroupBy.aggregate`. Users are encouraged to use the shorthand,
+``agg``. It will operate as if the corresponding method was called.
 
 .. ipython:: python
 
@@ -593,7 +604,7 @@ inputs are detailed in the sections below.
    grouped = df.groupby(["A", "B"])
    grouped.agg("sum")
 
-As you can see, the result of the aggregation will have the group names as the
+The result of the aggregation will have the group names as the
 new index along the grouped axis. In the case of multiple keys, the result is a
 :ref:`MultiIndex <advanced.hierarchical>` by default. As mentioned above, this can be
 changed by using the ``as_index`` option:
@@ -601,9 +612,9 @@ changed by using the ``as_index`` option:
 .. ipython:: python
 
    grouped = df.groupby(["A", "B"], as_index=False)
-   grouped.aggregate("sum")
+   grouped.agg("sum")
 
-   df.groupby("A", as_index=False)[["C", "D"]].sum()
+   df.groupby("A", as_index=False)[["C", "D"]].agg("sum")
 
 Note that you could use the ``reset_index`` DataFrame function to achieve the
 same result as the column names are stored in the resulting ``MultiIndex``:
@@ -612,10 +623,6 @@ same result as the column names are stored in the resulting ``MultiIndex``:
 
    df.groupby(["A", "B"]).agg("sum").reset_index()
 
-The aggregating functions above will exclude NA values. Any function which
-reduces a :class:`Series` to a scalar value is an aggregation function and will work,
-a trivial example is ``df.groupby('A').agg(lambda ser: 1)``.
-
 .. _groupby.aggregate.udf:
 
 Aggregation with User-Defined Functions
@@ -719,7 +726,7 @@ accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGr
 - The keywords are the *output* column names
 - The values are tuples whose first element is the column to select
   and the second element is the aggregation to apply to that column. pandas
-  provides the ``pandas.NamedAgg`` namedtuple with the fields ``['column', 'aggfunc']``
+  provides the :class:`NamedAgg` namedtuple with the fields ``['column', 'aggfunc']``
   to make it clearer what the arguments are. As usual, the aggregation can
   be a callable or a string alias.
 
@@ -734,7 +741,7 @@ accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGr
    )
 
 
-``pandas.NamedAgg`` is just a ``namedtuple``. Plain tuples are allowed as well.
+:class:`NamedAgg` is just a ``namedtuple``. Plain tuples are allowed as well.
 
 .. ipython:: python
 
@@ -794,7 +801,8 @@ Transformation
 --------------
 
 A transformation is a GroupBy operation whose result is indexed the same
-as the one being grouped. Common examples include ``cumsum`` and ``diff``.
+as the one being grouped. Common examples include :meth:`~.DataFrameGroupBy.cumsum` and
+:meth:`~.DataFrameGroupBy.diff`.
 
 .. ipython:: python
 
@@ -846,9 +854,9 @@ The following methods on GroupBy act as transformations. Of these methods, only
         :meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group
 
 In addition, passing any built-in aggregation method as a string to
-:meth:`~.DataFrameGroupBy.transform` (see below) will broadcast the result across the group,
-producing a transformed result. If the aggregation method is Cython-optimized, this
-will be performant as well.
+:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result
+across the group, producing a transformed result. If the aggregation method is
+Cython-optimized, this will be performant as well.
 
 .. _groupby.transformation.transform:
 
@@ -856,10 +864,10 @@ The :meth:`~.DataFrameGroupBy.transform` method
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Similar to the :ref:`aggregation method <groupby.aggregate.agg>`, the
-:meth:`~.DataFrameGroupBy.transform` can accept string aliases to the built-in
-transform methods in the previous section. It can *also* accept string aliases to the
-built-in aggregation methods. When an aggregation method is provided, the result will
-be broadcast across the group.
+:meth:`~.DataFrameGroupBy.transform` method can accept string aliases to the built-in
+transformation methods in the previous section. It can *also* accept string aliases to
+the built-in aggregation methods. When an aggregation method is provided, the result
+will be broadcast across the group.
 
 .. ipython:: python
 
diff --git a/doc/source/user_guide/timeseries.rst b/doc/source/user_guide/timeseries.rst
index a675e30823c89..4cd98c89e7180 100644
--- a/doc/source/user_guide/timeseries.rst
+++ b/doc/source/user_guide/timeseries.rst
@@ -1618,7 +1618,7 @@ The ``resample`` function is very flexible and allows you to specify many
 different parameters to control the frequency conversion and resampling
 operation.
 
-Any function available via :ref:`dispatching <groupby.dispatch>` is available as
+Any built-in method available via :ref:`GroupBy <api.groupby>` is available as
 a method of the returned object, including ``sum``, ``mean``, ``std``, ``sem``,
 ``max``, ``min``, ``median``, ``first``, ``last``, ``ohlc``:
 
diff --git a/doc/source/whatsnew/v0.7.0.rst b/doc/source/whatsnew/v0.7.0.rst
index 1ee6a9899a655..2336ccaeac820 100644
--- a/doc/source/whatsnew/v0.7.0.rst
+++ b/doc/source/whatsnew/v0.7.0.rst
@@ -346,7 +346,7 @@ Other API changes
 Performance improvements
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
-- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.cython>` no longer
+- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.builtin>` no longer
   presort the data, thus achieving a significant speedup (:issue:`93`).  GroupBy
   aggregations with Python functions significantly sped up by clever
   manipulation of the ndarray data type in Cython (:issue:`496`).

From 436339740e6f0fad9e8ff2ba9162950d554b92d0 Mon Sep 17 00:00:00 2001
From: Dea Leon <deamarialeon@gmail.com>
Date: Fri, 3 Mar 2023 18:34:24 +0100
Subject: [PATCH 3/5] DOC Checking groupby guide

---
 doc/source/user_guide/groupby.rst | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index d35f3092ba1e5..886581f9f45ea 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -479,9 +479,9 @@ Aggregation
 -----------
 
 An aggregation is a GroupBy operation that reduces the dimension of the grouping
-object. The result of an aggregation is, or at least treated as,
-a scalar value for each column in a group. For example, producing a sum of each
-column in group of values.
+object. The result of an aggregation is, or at least is treated as,
+a scalar value for each column in a group. For example, producing the sum of each
+column in a group of values.
 
 .. ipython:: python
 
@@ -633,7 +633,7 @@ Users can also provide their own User-Defined Functions (UDFs) for custom aggreg
 .. warning::
 
     When aggregating with a UDF, the UDF should not mutate the
-    provided ``Series``, see :ref:`gotchas.udf-mutation` for more information.
+    provided ``Series``. See :ref:`gotchas.udf-mutation` for more information.
 
 .. note::
 
@@ -674,7 +674,7 @@ column, which produces an aggregated result with a hierarchical index:
    grouped[["C", "D"]].agg(["sum", "mean", "std"])
 
 
-The resulting aggregations are named for the functions themselves. If you
+The resulting aggregations are named after the functions themselves. If you
 need to rename, then you can add in a chained operation for a ``Series`` like this:
 
 .. ipython:: python
@@ -752,7 +752,7 @@ accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGr
    )
 
 
-If your desired output column names are not valid Python keywords, construct a dictionary
+If column names you want are not valid Python keywords, construct a dictionary
 and unpack the keyword arguments
 
 .. ipython:: python
@@ -766,7 +766,7 @@ and unpack the keyword arguments
 When using named aggregation, additional keyword arguments are not passed through
 to the aggregation functions; only pairs
 of ``(column, aggfunc)`` should be passed as ``**kwargs``. If your aggregation functions
-requires additional arguments, partially apply them with :meth:`functools.partial`.
+require additional arguments, apply them partially with :meth:`functools.partial`.
 
 Named aggregation is also valid for Series groupby aggregations. In this case there's
 no column selection, so the values are just the functions.
@@ -789,7 +789,7 @@ columns of a DataFrame:
    grouped.agg({"C": "sum", "D": lambda x: np.std(x, ddof=1)})
 
 The function names can also be strings. In order for a string to be valid it
-must be either implemented on GroupBy:
+must be implemented on GroupBy:
 
 .. ipython:: python
 
@@ -912,7 +912,7 @@ Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that o
 transformation function. If the results from different groups have different dtypes, then
 a common dtype will be determined in the same way as ``DataFrame`` construction.
 
-Suppose we wished to standardize the data within each group:
+Suppose we wish to standardize the data within each group:
 
 .. ipython:: python
 
@@ -985,7 +985,7 @@ Another common data transform is to replace missing data with the group mean.
 
    transformed = grouped.transform(lambda x: x.fillna(x.mean()))
 
-We can verify that the group means have not changed in the transformed data
+We can verify that the group means have not changed in the transformed data,
 and that the transformed data contains no NAs.
 
 .. ipython:: python
@@ -1030,7 +1030,7 @@ It is possible to use ``resample()``, ``expanding()`` and
 ``rolling()`` as methods on groupbys.
 
 The example below will apply the ``rolling()`` method on the samples of
-the column B based on the groups of column A.
+the column B, based on the groups of column A.
 
 .. ipython:: python
 
@@ -1050,7 +1050,7 @@ group.
 
 
 Suppose you want to use the ``resample()`` method to get a daily
-frequency in each group of your dataframe and wish to complete the
+frequency in each group of your dataframe, and wish to complete the
 missing values with the ``ffill()`` method.
 
 .. ipython:: python

From 7d28a97cca22593337d3868143478d272077f35d Mon Sep 17 00:00:00 2001
From: Richard Shadrach <rhshadrach@gmail.com>
Date: Sat, 4 Mar 2023 08:06:36 -0500
Subject: [PATCH 4/5] Fix as_index language

---
 doc/source/user_guide/groupby.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index 886581f9f45ea..a2b072045bb96 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -574,8 +574,8 @@ number of unique values.
 
 .. note::
 
-   Aggregation functions **will not** operate on the groups that you are aggregating over
-   if they are named *columns*, when ``as_index=True``, the default. The grouped columns will
+   Aggregation functions **will not** return the groups that you are aggregating over
+   as named *columns*, when ``as_index=True``, the default. The grouped columns will
    be the **indices** of the returned object.
 
    Passing ``as_index=False`` **will** return the groups that you are aggregating over, if they are
@@ -752,7 +752,7 @@ accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGr
    )
 
 
-If column names you want are not valid Python keywords, construct a dictionary
+If the column names you want are not valid Python keywords, construct a dictionary
 and unpack the keyword arguments
 
 .. ipython:: python

From ec0d5f85ff037a1f2048cb5f07873e553c7ebdd7 Mon Sep 17 00:00:00 2001
From: Richard Shadrach <rhshadrach@gmail.com>
Date: Sun, 5 Mar 2023 15:48:20 -0500
Subject: [PATCH 5/5] Improvements

---
 doc/source/user_guide/groupby.rst | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index 0fa605278c938..31c4bd1d7c87c 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -517,16 +517,16 @@ listed below, those with a ``*`` do *not* have a Cython-optimized implementation
         :meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
         :meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
         :meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
-        :meth:`~.DataFrameGroupBy.first` *;Compute the first occurring value in each group
+        :meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
         :meth:`~.DataFrameGroupBy.idxmax` *;Compute the index of the maximum value in each group
         :meth:`~.DataFrameGroupBy.idxmin` *;Compute the index of the minimum value in each group
-        :meth:`~.DataFrameGroupBy.last` *;Compute the last occurring value in each group
-        :meth:`~.DataFrameGroupBy.max` *;Compute the maximum value in each group
+        :meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
+        :meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
         :meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
         :meth:`~.DataFrameGroupBy.median`;Compute the median of each group
-        :meth:`~.DataFrameGroupBy.min` *;Compute the minimum value in each group
+        :meth:`~.DataFrameGroupBy.min`;Compute the minimum value in each group
         :meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
-        :meth:`~.DataFrameGroupBy.prod` *;Compute the product of the values in each group
+        :meth:`~.DataFrameGroupBy.prod`;Compute the product of the values in each group
         :meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
         :meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
         :meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
@@ -614,8 +614,9 @@ changed by using the ``as_index`` option:
 
    df.groupby("A", as_index=False)[["C", "D"]].agg("sum")
 
-Note that you could use the ``reset_index`` DataFrame function to achieve the
-same result as the column names are stored in the resulting ``MultiIndex``:
+Note that you could use the :meth:`DataFrame.reset_index` DataFrame function to achieve
+the same result as the column names are stored in the resulting ``MultiIndex``, although
+this will make an extra copy.
 
 .. ipython:: python
 
@@ -1000,7 +1001,8 @@ and that the transformed data contains no NAs.
 .. _groupby_efficient_transforms:
 
 As mentioned in the note above, each of the examples in this section can be computed
-more efficiently using built-in methods.
+more efficiently using built-in methods. In the code below, the inefficient way
+using a UDF is commented out and the faster alternative appears below.
 
 .. ipython:: python
 
@@ -1082,8 +1084,8 @@ In the following example, ``class`` is included in the result.
 .. note::
 
     Unlike aggregations, filtrations do not add the group keys to the index of the
-    result. Because of this, passing ``as_index=False`` will not affect these
-    transformation methods.
+    result. Because of this, passing ``as_index=False`` or ``sort=True`` will not
+    affect these methods.
 
 Filtrations will respect subsetting the columns of the GroupBy object.