DOC: Restructure and expand UDF page

datapythonista · datapythonista · commit 2e1b427b958a · 2025-05-21T13:31:31.000+02:00
diff --git a/doc/source/user_guide/user_defined_functions.rst b/doc/source/user_guide/user_defined_functions.rst
@@ -96,15 +96,15 @@ User-Defined Functions can be applied across various pandas methods:
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
 | :meth:`apply` (axis=1)     | Row (Series)           | Row (Series)             | Apply a function to each row                                                                                                                 |
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`agg`                | Series/DataFrame       | Scalar or Series         | Aggregate and summarizes values, e.g., sum or custom reducer                                                                                 |
+| :meth:`pipe`               | Series or DataFrame    | Series or DataFrame      | Chain functions together to apply to Series or Dataframe                                                                                     |
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`transform` (axis=0) | Column (Series)        | Column(Series)           | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data                                |
+| :meth:`filter`             | Series or DataFrame    | Boolean                  | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`transform` (axis=1) | Row (Series)           | Row (Series)             | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data                                |
+| :meth:`agg`                | Series or DataFrame    | Scalar or Series         | Aggregate and summarizes values, e.g., sum or custom reducer                                                                                 |
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`filter`             | Series or DataFrame    | Boolean                  | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
+| :meth:`transform` (axis=0) | Column (Series)        | Column (Series)          | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data                                |
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
-| :meth:`pipe`               | Series/DataFrame       | Series/DataFrame         | Chain functions together to apply to Series or Dataframe                                                                                     |
+| :meth:`transform` (axis=1) | Row (Series)           | Row (Series)             | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data                                |
 +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
 
 When applying UDFs in pandas, it is essential to select the appropriate method based
@@ -118,53 +118,108 @@ decisions, ensuring more efficient and maintainable code.
     and :ref:`ewm()<window>` for details.
 
 
-:meth:`DataFrame.apply`
-~~~~~~~~~~~~~~~~~~~~~~~
+:meth:`Series.map` and :meth:`DataFrame.map`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The :meth:`apply` method allows you to apply UDFs along either rows or columns. While flexible,
-it is slower than vectorized operations and should be used only when you need operations
-that cannot be achieved with built-in pandas functions.
+The :meth:`map` method is used specifically to apply element-wise UDFs. This means the function
+will be called for each element in the ``Series`` or ``DataFrame``, with the individual value or
+the cell as the function argument.
 
-When to use: :meth:`apply` is suitable when no alternative vectorized method or UDF method is available,
-but consider optimizing performance with vectorized operations wherever possible.
+.. ipython:: python
 
-:meth:`DataFrame.agg`
-~~~~~~~~~~~~~~~~~~~~~
+    temperature_celsius = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
+    })
 
-If you need to aggregate data, :meth:`agg` is a better choice than apply because it is
-specifically designed for aggregation operations.
+    def to_fahrenheit(value):
+        return value * (9 / 5) + 32
 
-When to use: Use :meth:`agg` for performing custom aggregations, where the operation returns
-a scalar value on each input.
+    temperature_celsius.map(to_fahrenheit)
 
-:meth:`DataFrame.transform`
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In this example, the function ``to_fahrenheit`` will be called 6 times, once for each value
+in the ``DataFrame``. And the result of each call will be returned in the corresponding cell
+of the resulting ``DataFrame``.
 
-The :meth:`transform` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
-It is generally faster than apply because it can take advantage of pandas' internal optimizations.
+In general, ``map`` will be slow, as it will not make use of vectorization. Instead, a Python
+function call for each value will be required, which will slow down things significantly if
+working with medium or large data.
 
-When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
+When to use: Use :meth:`map` for applying element-wise UDFs to DataFrames or Series.
 
-.. code-block:: python
+:meth:`Series.apply` and :meth:`DataFrame.apply`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-    from sklearn.linear_model import LinearRegression
+The :meth:`apply` method allows you to apply UDFs for a whole column or row. This is different
+from :meth:`map` in that the function will be called for each column (or row), not for each individual value.
 
-    df = pd.DataFrame({
-        'group': ['A', 'A', 'A', 'B', 'B', 'B'],
-        'x': [1, 2, 3, 1, 2, 3],
-        'y': [2, 4, 6, 1, 2, 1.5]
-    }).set_index("x")
+.. ipython:: python
 
-    # Function to fit a model to each group
-    def fit_model(group):
-        x = group.index.to_frame()
-        y = group
-        model = LinearRegression()
-        model.fit(x, y)
-        pred = model.predict(x)
-        return pred
+    temperature_celsius = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
+    })
 
-    result = df.groupby('group').transform(fit_model)
+    def to_fahrenheit(column):
+        return column * (9 / 5) + 32
+
+    temperature_celsius.apply(to_fahrenheit)
+
+In the example, ``to_fahrenheit`` will be called only twice, as opposed to the 6 times with :meth:`map`.
+This will be faster than using :meth:`map`, since the operations for each column are vectorized, and the
+overhead of iterating over data in Python and calling Python functions is significantly reduced.
+
+In some cases, the function may require all the data to be able to compute the result. So :meth:`apply`
+is needed, since with :meth:`map` the function can only access one element at a time.
+
+.. ipython:: python
+
+    temperature = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
+    })
+
+    def normalize(column):
+        return column / column.mean()
+
+    temperature.apply(normalize)
+
+In the example, the ``normalize`` function needs to compute the mean of the whole column in order
+to divide each element by it. So, we cannot call the function for each element, but we need the
+function to receive the whole column.
+
+:meth:`apply` can also execute function by row, by specifying ``axis=1``.
+
+.. ipython:: python
+
+    temperature = pd.DataFrame({
+        "NYC": [14, 21, 23],
+        "Los Angeles": [22, 28, 31],
+    })
+
+    def hotter(row):
+        return row["Los Angeles"] - row["NYC"]
+
+    temperature.apply(hotter, axis=1)
+
+In the example, the function ``hotter`` will be called 3 times, once for each row. And each
+call will receive the whole row as the argument, allowing computations that require more than
+one value in the row.
+
+``apply`` is also available for :meth:`SeriesGroupBy.apply`, :meth:`DataFrameGroupBy.apply`,
+:meth:`Rolling.apply`, :meth:`Expanding.apply` and :meth:`Resampler.apply`. You can read more
+about ``apply`` in groupby operations :ref:`groupby.apply`.
+
+When to use: :meth:`apply` is suitable when no alternative vectorized method or UDF method is available,
+but consider optimizing performance with vectorized operations wherever possible.
+
+:meth:`DataFrame.pipe`
+~~~~~~~~~~~~~~~~~~~~~~
+
+The :meth:`pipe` method is useful for chaining operations together into a clean and readable pipeline.
+It is a helpful tool for organizing complex data processing workflows.
+
+When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
 
 :meth:`DataFrame.filter`
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -199,20 +254,43 @@ When to use: Use :meth:`filter` when you want to use a UDF to create a subset of
 Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
 for example, by using list comprehensions.
 
-:meth:`DataFrame.map`
+:meth:`DataFrame.agg`
 ~~~~~~~~~~~~~~~~~~~~~
 
-The :meth:`map` method is used specifically to apply element-wise UDFs.
+If you need to aggregate data, :meth:`agg` is a better choice than apply because it is
+specifically designed for aggregation operations.
 
-When to use: Use :meth:`map` for applying element-wise UDFs to DataFrames or Series.
+When to use: Use :meth:`agg` for performing custom aggregations, where the operation returns
+a scalar value on each input.
 
-:meth:`DataFrame.pipe`
-~~~~~~~~~~~~~~~~~~~~~~
+:meth:`DataFrame.transform`
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The :meth:`pipe` method is useful for chaining operations together into a clean and readable pipeline.
-It is a helpful tool for organizing complex data processing workflows.
+The :meth:`transform` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
+It is generally faster than apply because it can take advantage of pandas' internal optimizations.
 
-When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
+When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
+
+.. code-block:: python
+
+    from sklearn.linear_model import LinearRegression
+
+    df = pd.DataFrame({
+        'group': ['A', 'A', 'A', 'B', 'B', 'B'],
+        'x': [1, 2, 3, 1, 2, 3],
+        'y': [2, 4, 6, 1, 2, 1.5]
+    }).set_index("x")
+
+    # Function to fit a model to each group
+    def fit_model(group):
+        x = group.index.to_frame()
+        y = group
+        model = LinearRegression()
+        model.fit(x, y)
+        pred = model.predict(x)
+        return pred
+
+    result = df.groupby('group').transform(fit_model)
 
 
 Performance