@@ -1295,57 +1295,41 @@ too many fields will raise an error by default:
1295
1295
1296
1296
You can elect to skip bad lines:
1297
1297
1298
- .. code-block :: ipython
1299
-
1300
- In [29]: pd.read_csv(StringIO(data), on_bad_lines="warn")
1301
- Skipping line 3: expected 3 fields, saw 4
1298
+ .. ipython :: ipython
1302
1299
1303
- Out[29]:
1304
- a b c
1305
- 0 1 2 3
1306
- 1 8 9 10
1300
+ pd.read_csv(StringIO(data), on_bad_lines="warn")
1307
1301
1308
1302
Or pass a callable function to handle the bad line if ``engine="python" ``.
1309
1303
The bad line will be a list of strings that was split by the ``sep ``:
1310
1304
1311
- .. code-block :: ipython
1305
+ .. versionadded :: 1.4.0
1306
+
1307
+ .. ipython :: ipython
1308
+
1309
+ external_list = []
1312
1310
1313
- In [30]: pd.read_csv(StringIO(data), on_bad_lines=lambda x: x[-3:], engine="python")
1314
- Out[30]:
1315
- a b c
1316
- 0 1 2 3
1317
- 1 5 6 7
1318
- 2 8 9 10
1311
+ def func(line):
1312
+ external_list.append(line)
1313
+ return line[-3:]
1319
1314
1320
- .. versionadded:: 1.4.0
1315
+ pd.read_csv(StringIO(data), on_bad_lines=func, engine="python")
1321
1316
1317
+ external_list
1322
1318
1323
1319
You can also use the ``usecols `` parameter to eliminate extraneous column
1324
1320
data that appear in some lines but not others:
1325
1321
1326
- .. code-block :: ipython
1327
-
1328
- In [31]: pd.read_csv(StringIO(data), usecols=[0, 1, 2])
1322
+ .. ipython :: ipython
1329
1323
1330
- Out[31]:
1331
- a b c
1332
- 0 1 2 3
1333
- 1 4 5 6
1334
- 2 8 9 10
1324
+ pd.read_csv(StringIO(data), usecols=[0, 1, 2])
1335
1325
1336
1326
In case you want to keep all data including the lines with too many fields, you can
1337
1327
specify a sufficient number of ``names ``. This ensures that lines with not enough
1338
1328
fields are filled with ``NaN ``.
1339
1329
1340
- .. code-block :: ipython
1341
-
1342
- In [32]: pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd'])
1330
+ .. ipython :: ipython
1343
1331
1344
- Out[32]:
1345
- a b c d
1346
- 0 1 2 3 NaN
1347
- 1 4 5 6 7
1348
- 2 8 9 10 NaN
1332
+ pd.read_csv(StringIO(data), names=['a', 'b', 'c', 'd'])
1349
1333
1350
1334
.. _io.dialect :
1351
1335
0 commit comments