@@ -1151,34 +1151,91 @@ parameter that is by default ``False`` and copies the underlying data. Pass
1151
1151
The Panel class has a related :meth: `~Panel.rename_axis ` class which can rename
1152
1152
any of its three axes.
1153
1153
1154
+ .. _basics.iteration :
1155
+
1154
1156
Iteration
1155
1157
---------
1156
1158
1157
- Because Series is array-like, basic iteration produces the values. Other data
1158
- structures follow the dict-like convention of iterating over the "keys" of the
1159
- objects. In short:
1159
+ The behavior of basic iteration over pandas objects depends on the type.
1160
+ When iterating over a Series, it is regarded as array-like, and basic iteration
1161
+ produces the values. Other data structures, like DataFrame and Panel,
1162
+ follow the dict-like convention of iterating over the "keys" of the
1163
+ objects.
1164
+
1165
+ In short, basic iteration (``for i in object ``) produces:
1160
1166
1161
- * **Series **: values
1162
- * **DataFrame **: column labels
1163
- * **Panel **: item labels
1167
+ * **Series **: values
1168
+ * **DataFrame **: column labels
1169
+ * **Panel **: item labels
1164
1170
1165
- Thus, for example:
1171
+ Thus, for example, iterating over a DataFrame gives you the column names :
1166
1172
1167
1173
.. ipython ::
1168
1174
1169
- In [0]: for col in df:
1170
- ...: print(col)
1171
- ...:
1175
+ In [0]: df = pd.DataFrame({'col1' : np.random.randn(3), 'col2' : np.random.randn(3)},
1176
+ ...: index=['a', 'b', 'c'])
1177
+
1178
+ In [0]: for col in df:
1179
+ ...: print(col)
1180
+ ...:
1181
+
1182
+ Pandas objects also have the dict-like :meth: `~DataFrame.iteritems ` method to
1183
+ iterate over the (key, value) pairs.
1184
+
1185
+ To iterate over the rows of a DataFrame, you can use the following methods:
1186
+
1187
+ * :meth: `~DataFrame.iterrows `: Iterate over the rows of a DataFrame as (index, Series) pairs.
1188
+ This converts the rows to Series objects, which can change the dtypes and has some
1189
+ performance implications.
1190
+ * :meth: `~DataFrame.itertuples `: Iterate over the rows of a DataFrame as tuples of the values.
1191
+ This is a lot faster as :meth: `~DataFrame.iterrows `, and is in most cases preferable to
1192
+ use to iterate over the values of a DataFrame.
1193
+
1194
+ .. warning ::
1195
+
1196
+ Iterating through pandas objects is generally **slow **. In many cases,
1197
+ iterating manually over the rows is not needed and can be avoided with
1198
+ one of the following approaches:
1199
+
1200
+ * Look for a *vectorized * solution: many operations can be performed using
1201
+ built-in methods or numpy functions, (boolean) indexing, ...
1202
+
1203
+ * When you have a function that cannot work on the full DataFrame/Series
1204
+ at once, it is better to use :meth: `~DataFrame.apply ` instead of iterating
1205
+ over the values. See the docs on :ref: `function application <basics.apply >`.
1206
+
1207
+ * If you need to do iterative manipulations on the values but performance is
1208
+ important, consider writing the inner loop using e.g. cython or numba.
1209
+ See the :ref: `enhancing performance <enhancingperf >` section for some
1210
+ examples of this approach.
1211
+
1212
+ .. warning ::
1213
+
1214
+ You should **never modify ** something you are iterating over.
1215
+ This is not guaranteed to work in all cases. Depending on the
1216
+ data types, the iterator returns a copy and not a view, and writing
1217
+ to it will have no effect!
1218
+
1219
+ For example, in the following case setting the value has no effect:
1220
+
1221
+ .. ipython :: python
1222
+
1223
+ df = pd.DataFrame({' a' : [1 , 2 , 3 ], ' b' : [' a' , ' b' , ' c' ]})
1224
+
1225
+ for index, row in df.iterrows():
1226
+ row[' a' ] = 10
1227
+
1228
+ df
1172
1229
1173
1230
iteritems
1174
1231
~~~~~~~~~
1175
1232
1176
1233
Consistent with the dict-like interface, :meth: `~DataFrame.iteritems ` iterates
1177
1234
through key-value pairs:
1178
1235
1179
- * **Series **: (index, scalar value) pairs
1180
- * **DataFrame **: (column, Series) pairs
1181
- * **Panel **: (item, DataFrame) pairs
1236
+ * **Series **: (index, scalar value) pairs
1237
+ * **DataFrame **: (column, Series) pairs
1238
+ * **Panel **: (item, DataFrame) pairs
1182
1239
1183
1240
For example:
1184
1241
@@ -1189,22 +1246,46 @@ For example:
1189
1246
...: print(frame)
1190
1247
...:
1191
1248
1192
-
1193
1249
.. _basics.iterrows :
1194
1250
1195
1251
iterrows
1196
1252
~~~~~~~~
1197
1253
1198
- New in v0.7 is the ability to iterate efficiently through rows of a
1199
- DataFrame with :meth: ` ~DataFrame.iterrows ` . It returns an iterator yielding each
1254
+ :meth: ` ~DataFrame.iterrows ` allows you to iterate through the rows of a
1255
+ DataFrame as Series objects . It returns an iterator yielding each
1200
1256
index value along with a Series containing the data in each row:
1201
1257
1202
1258
.. ipython ::
1203
1259
1204
- In [0]: for row_index, row in df2 .iterrows():
1260
+ In [0]: for row_index, row in df .iterrows():
1205
1261
...: print('%s\n %s' % (row_index, row))
1206
1262
...:
1207
1263
1264
+ .. note ::
1265
+
1266
+ Because :meth: `~DataFrame.iterrows ` returns a Series for each row,
1267
+ it does **not ** preserve dtypes across the rows (dtypes are
1268
+ preserved across columns for DataFrames). For example,
1269
+
1270
+ .. ipython :: python
1271
+
1272
+ df_orig = pd.DataFrame([[1 , 1.5 ]], columns = [' int' , ' float' ])
1273
+ df_orig.dtypes
1274
+ row = next (df_orig.iterrows())[1 ]
1275
+ row
1276
+
1277
+ All values in ``row ``, returned as a Series, are now upcasted
1278
+ to floats, also the original integer value in column `x `:
1279
+
1280
+ .. ipython :: python
1281
+
1282
+ row[' int' ].dtype
1283
+ df_orig[' int' ].dtype
1284
+
1285
+ To preserve dtypes while iterating over the rows, it is better
1286
+ to use :meth: `~DataFrame.itertuples ` which returns tuples of the values
1287
+ and which is generally much faster as ``iterrows ``.
1288
+
1208
1289
For instance, a contrived way to transpose the DataFrame would be:
1209
1290
1210
1291
.. ipython :: python
@@ -1216,36 +1297,29 @@ For instance, a contrived way to transpose the DataFrame would be:
1216
1297
df2_t = pd.DataFrame(dict ((idx,values) for idx, values in df2.iterrows()))
1217
1298
print (df2_t)
1218
1299
1219
- .. note ::
1220
-
1221
- ``iterrows `` does **not ** preserve dtypes across the rows (dtypes are
1222
- preserved across columns for DataFrames). For example,
1223
-
1224
- .. ipython :: python
1225
-
1226
- df_iter = pd.DataFrame([[1 , 1.0 ]], columns = [' x' , ' y' ])
1227
- row = next (df_iter.iterrows())[1 ]
1228
- print (row[' x' ].dtype)
1229
- print (df_iter[' x' ].dtype)
1230
-
1231
1300
itertuples
1232
1301
~~~~~~~~~~
1233
1302
1234
- The :meth: `~DataFrame.itertuples ` method will return an iterator yielding a tuple for each row in the
1235
- DataFrame. The first element of the tuple will be the row's corresponding index
1236
- value, while the remaining values are the row values proper.
1303
+ The :meth: `~DataFrame.itertuples ` method will return an iterator
1304
+ yielding a tuple for each row in the DataFrame. The first element
1305
+ of the tuple will be the row's corresponding index value,
1306
+ while the remaining values are the row values.
1237
1307
1238
1308
For instance,
1239
1309
1240
1310
.. ipython :: python
1241
1311
1242
- for r in df2.itertuples():
1243
- print (r)
1312
+ for row in df.itertuples():
1313
+ print (row)
1314
+
1315
+ This method does not convert the row to a Series object but just returns the
1316
+ values inside a tuple. Therefore, :meth: `~DataFrame.itertuples ` preserves the
1317
+ data type of the values and is generally faster as :meth: `~DataFrame.iterrows `.
1244
1318
1245
1319
.. _basics.dt_accessors :
1246
1320
1247
1321
.dt accessor
1248
- ~~~~~~~~~~~~
1322
+ ------------
1249
1323
1250
1324
``Series `` has an accessor to succinctly return datetime like properties for the
1251
1325
*values * of the Series, if its a datetime/period like Series.
0 commit comments