From 6bad2ff4ca1276099dcb727fe119e6f8c09e8d5b Mon Sep 17 00:00:00 2001 From: Kostya Farber Date: Sat, 17 Dec 2022 13:37:13 +0000 Subject: [PATCH 1/2] DOC: add to io documentation of on_bad_lines --- doc/source/user_guide/io.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 3dcc52fb63eb7..9ba537074436f 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -1254,6 +1254,8 @@ The bad line will be a list of strings that was split by the ``sep``: .. versionadded:: 1.4.0 +Note that the callable function will handle only a line with too many fields. +Bad lines caused by other errors will be silently skipped. You can also use the ``usecols`` parameter to eliminate extraneous column data that appear in some lines but not others: From 2262bab51511ac1920729424d3d6dbbaf80bf339 Mon Sep 17 00:00:00 2001 From: Kostya Farber Date: Fri, 30 Dec 2022 08:44:11 +0000 Subject: [PATCH 2/2] DOC: add example of silenty skipped line --- doc/source/user_guide/io.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index 9ba537074436f..c213c3dc69bf1 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -1257,6 +1257,19 @@ The bad line will be a list of strings that was split by the ``sep``: Note that the callable function will handle only a line with too many fields. Bad lines caused by other errors will be silently skipped. +For example: + +.. code-block:: ipython + + def bad_lines_func(line): + print(line) + + data = 'name,type\nname a,a is of type a\nname b,"b\" is of type b"' + data + pd.read_csv(data, on_bad_lines=bad_lines_func, engine="python") + +The line was not processed in this case, as a "bad line" here is caused by an escape character. + You can also use the ``usecols`` parameter to eliminate extraneous column data that appear in some lines but not others: