@@ -96,15 +96,15 @@ User-Defined Functions can be applied across various pandas methods:
96
96
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
97
97
| :meth: `apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98
98
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99
- | :meth: `agg ` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
99
+ | :meth: `pipe ` | Series or DataFrame | Series or DataFrame | Chain functions together to apply to Series or Dataframe |
100
100
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101
- | :meth: `transform ` (axis=0) | Column (Series) | Column( Series) | Same as :meth: ` apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
101
+ | :meth: `filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns `` False `` |
102
102
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103
- | :meth: `transform ` (axis=1) | Row (Series) | Row ( Series) | Same as :meth: ` apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
103
+ | :meth: `agg ` | Series or DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
104
104
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105
- | :meth: `filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns `` False `` |
105
+ | :meth: `transform ` (axis=0) | Column (Series) | Column ( Series) | Same as :meth: ` apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
106
106
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107
- | :meth: `pipe ` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
107
+ | :meth: `transform ` (axis=1) | Row (Series) | Row ( Series) | Same as :meth: ` apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
108
108
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
109
109
110
110
When applying UDFs in pandas, it is essential to select the appropriate method based
@@ -118,53 +118,108 @@ decisions, ensuring more efficient and maintainable code.
118
118
and :ref: `ewm()<window> ` for details.
119
119
120
120
121
- :meth: `DataFrame.apply `
122
- ~~~~~~~~~~~~~~~~~~~~~~~
121
+ :meth: `Series.map ` and :meth: ` DataFrame.map `
122
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123
123
124
- The :meth: `apply ` method allows you to apply UDFs along either rows or columns. While flexible,
125
- it is slower than vectorized operations and should be used only when you need operations
126
- that cannot be achieved with built-in pandas functions .
124
+ The :meth: `map ` method is used specifically to apply element-wise UDFs. This means the function
125
+ will be called for each element in the `` Series `` or `` DataFrame ``, with the individual value or
126
+ the cell as the function argument .
127
127
128
- When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
129
- but consider optimizing performance with vectorized operations wherever possible.
128
+ .. ipython :: python
130
129
131
- :meth: `DataFrame.agg `
132
- ~~~~~~~~~~~~~~~~~~~~~
130
+ temperature_celsius = pd.DataFrame({
131
+ " NYC" : [14 , 21 , 23 ],
132
+ " Los Angeles" : [22 , 28 , 31 ],
133
+ })
133
134
134
- If you need to aggregate data, :meth: ` agg ` is a better choice than apply because it is
135
- specifically designed for aggregation operations.
135
+ def to_fahrenheit ( value ):
136
+ return value * ( 9 / 5 ) + 32
136
137
137
- When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
138
- a scalar value on each input.
138
+ temperature_celsius.map(to_fahrenheit)
139
139
140
- :meth: `DataFrame.transform `
141
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
140
+ In this example, the function ``to_fahrenheit `` will be called 6 times, once for each value
141
+ in the ``DataFrame ``. And the result of each call will be returned in the corresponding cell
142
+ of the resulting ``DataFrame ``.
142
143
143
- The :meth: `transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
144
- It is generally faster than apply because it can take advantage of pandas' internal optimizations.
144
+ In general, ``map `` will be slow, as it will not make use of vectorization. Instead, a Python
145
+ function call for each value will be required, which will slow down things significantly if
146
+ working with medium or large data.
145
147
146
- When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame .
148
+ When to use: Use :meth: ` map ` for applying element-wise UDFs to DataFrames or Series .
147
149
148
- .. code-block :: python
150
+ :meth: `Series.apply ` and :meth: `DataFrame.apply `
151
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149
152
150
- from sklearn.linear_model import LinearRegression
153
+ The :meth: `apply ` method allows you to apply UDFs for a whole column or row. This is different
154
+ from :meth: `map ` in that the function will be called for each column (or row), not for each individual value.
151
155
152
- df = pd.DataFrame({
153
- ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
154
- ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
155
- ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
156
- }).set_index(" x" )
156
+ .. ipython :: python
157
157
158
- # Function to fit a model to each group
159
- def fit_model (group ):
160
- x = group.index.to_frame()
161
- y = group
162
- model = LinearRegression()
163
- model.fit(x, y)
164
- pred = model.predict(x)
165
- return pred
158
+ temperature_celsius = pd.DataFrame({
159
+ " NYC" : [14 , 21 , 23 ],
160
+ " Los Angeles" : [22 , 28 , 31 ],
161
+ })
166
162
167
- result = df.groupby(' group' ).transform(fit_model)
163
+ def to_fahrenheit (column ):
164
+ return column * (9 / 5 ) + 32
165
+
166
+ temperature_celsius.apply(to_fahrenheit)
167
+
168
+ In the example, ``to_fahrenheit `` will be called only twice, as opposed to the 6 times with :meth: `map `.
169
+ This will be faster than using :meth: `map `, since the operations for each column are vectorized, and the
170
+ overhead of iterating over data in Python and calling Python functions is significantly reduced.
171
+
172
+ In some cases, the function may require all the data to be able to compute the result. So :meth: `apply `
173
+ is needed, since with :meth: `map ` the function can only access one element at a time.
174
+
175
+ .. ipython :: python
176
+
177
+ temperature = pd.DataFrame({
178
+ " NYC" : [14 , 21 , 23 ],
179
+ " Los Angeles" : [22 , 28 , 31 ],
180
+ })
181
+
182
+ def normalize (column ):
183
+ return column / column.mean()
184
+
185
+ temperature.apply(normalize)
186
+
187
+ In the example, the ``normalize `` function needs to compute the mean of the whole column in order
188
+ to divide each element by it. So, we cannot call the function for each element, but we need the
189
+ function to receive the whole column.
190
+
191
+ :meth: `apply ` can also execute function by row, by specifying ``axis=1 ``.
192
+
193
+ .. ipython :: python
194
+
195
+ temperature = pd.DataFrame({
196
+ " NYC" : [14 , 21 , 23 ],
197
+ " Los Angeles" : [22 , 28 , 31 ],
198
+ })
199
+
200
+ def hotter (row ):
201
+ return row[" Los Angeles" ] - row[" NYC" ]
202
+
203
+ temperature.apply(hotter, axis = 1 )
204
+
205
+ In the example, the function ``hotter `` will be called 3 times, once for each row. And each
206
+ call will receive the whole row as the argument, allowing computations that require more than
207
+ one value in the row.
208
+
209
+ ``apply `` is also available for :meth: `SeriesGroupBy.apply `, :meth: `DataFrameGroupBy.apply `,
210
+ :meth: `Rolling.apply `, :meth: `Expanding.apply ` and :meth: `Resampler.apply `. You can read more
211
+ about ``apply `` in groupby operations :ref: `groupby.apply `.
212
+
213
+ When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
214
+ but consider optimizing performance with vectorized operations wherever possible.
215
+
216
+ :meth: `DataFrame.pipe `
217
+ ~~~~~~~~~~~~~~~~~~~~~~
218
+
219
+ The :meth: `pipe ` method is useful for chaining operations together into a clean and readable pipeline.
220
+ It is a helpful tool for organizing complex data processing workflows.
221
+
222
+ When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
168
223
169
224
:meth: `DataFrame.filter `
170
225
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -199,20 +254,43 @@ When to use: Use :meth:`filter` when you want to use a UDF to create a subset of
199
254
Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
200
255
for example, by using list comprehensions.
201
256
202
- :meth: `DataFrame.map `
257
+ :meth: `DataFrame.agg `
203
258
~~~~~~~~~~~~~~~~~~~~~
204
259
205
- The :meth: `map ` method is used specifically to apply element-wise UDFs.
260
+ If you need to aggregate data, :meth: `agg ` is a better choice than apply because it is
261
+ specifically designed for aggregation operations.
206
262
207
- When to use: Use :meth: `map ` for applying element-wise UDFs to DataFrames or Series.
263
+ When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
264
+ a scalar value on each input.
208
265
209
- :meth: `DataFrame.pipe `
210
- ~~~~~~~~~~~~~~~~~~~~~~
266
+ :meth: `DataFrame.transform `
267
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
211
268
212
- The :meth: `pipe ` method is useful for chaining operations together into a clean and readable pipeline .
213
- It is a helpful tool for organizing complex data processing workflows .
269
+ The :meth: `transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame .
270
+ It is generally faster than apply because it can take advantage of pandas' internal optimizations .
214
271
215
- When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
272
+ When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
273
+
274
+ .. code-block :: python
275
+
276
+ from sklearn.linear_model import LinearRegression
277
+
278
+ df = pd.DataFrame({
279
+ ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
280
+ ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
281
+ ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
282
+ }).set_index(" x" )
283
+
284
+ # Function to fit a model to each group
285
+ def fit_model (group ):
286
+ x = group.index.to_frame()
287
+ y = group
288
+ model = LinearRegression()
289
+ model.fit(x, y)
290
+ pred = model.predict(x)
291
+ return pred
292
+
293
+ result = df.groupby(' group' ).transform(fit_model)
216
294
217
295
218
296
Performance
0 commit comments