-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Allow callable for on_bad_lines in read_csv when engine="python" #45146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
d4c0cb7
f654e39
6c12102
1aee16c
103ae04
4a853f9
d759a88
dbf13e7
9b73ae4
15752be
b77da02
39a83b4
a5f3656
ae4d499
d3f9c40
8886bf8
e3b445d
013f05f
67b7e3e
743b83b
bd67152
e04124a
6a92f07
4817770
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
from textwrap import fill | ||
from typing import ( | ||
Any, | ||
Callable, | ||
NamedTuple, | ||
) | ||
import warnings | ||
|
@@ -354,7 +355,7 @@ | |
.. deprecated:: 1.3.0 | ||
The ``on_bad_lines`` parameter should be used instead to specify behavior upon | ||
encountering a bad line instead. | ||
on_bad_lines : {{'error', 'warn', 'skip'}}, default 'error' | ||
on_bad_lines : str or callable, default 'error' | ||
mroeschke marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Specifies what to do upon encountering a bad line (a line with too many fields). | ||
Allowed values are : | ||
|
||
|
@@ -364,6 +365,12 @@ | |
|
||
.. versionadded:: 1.3.0 | ||
|
||
- callable, function with signature ``(bad_line: list[str]) -> list[str]`` | ||
that will process a single bad line. ``bad_line`` is a list of strings | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. am I right in thinking the output There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Added a test to check this behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically it can return a list of Hashables, this should not be an issue. We should document, that the fallback behavior is a warning |
||
split by the ``sep``. Only supported when ``engine="python"`` | ||
|
||
.. versionadded:: 1.4.0 | ||
|
||
delim_whitespace : bool, default False | ||
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be | ||
used as the sep. Equivalent to setting ``sep='\\s+'``. If this option | ||
|
@@ -1367,7 +1374,7 @@ def _refine_defaults_read( | |
sep: str | object, | ||
error_bad_lines: bool | None, | ||
warn_bad_lines: bool | None, | ||
on_bad_lines: str | None, | ||
on_bad_lines: str | Callable | None, | ||
names: ArrayLike | None | object, | ||
prefix: str | None | object, | ||
defaults: dict[str, Any], | ||
|
@@ -1399,7 +1406,7 @@ def _refine_defaults_read( | |
Whether to error on a bad line or not. | ||
warn_bad_lines : str or None | ||
Whether to warn on a bad line or not. | ||
on_bad_lines : str or None | ||
on_bad_lines : str, callable or None | ||
An option for handling bad lines or a sentinel value(None). | ||
names : array-like, optional | ||
List of column names to use. If the file contains a header row, | ||
|
@@ -1503,6 +1510,12 @@ def _refine_defaults_read( | |
kwds["on_bad_lines"] = ParserBase.BadLineHandleMethod.WARN | ||
elif on_bad_lines == "skip": | ||
kwds["on_bad_lines"] = ParserBase.BadLineHandleMethod.SKIP | ||
elif callable(on_bad_lines): | ||
if engine != "python": | ||
raise ValueError( | ||
"on_bad_line can only be a callable function if engine='python'" | ||
) | ||
kwds["on_bad_lines"] = on_bad_lines | ||
else: | ||
raise ValueError(f"Argument {on_bad_lines} is invalid for on_bad_lines") | ||
else: | ||
|
Uh oh!
There was an error while loading. Please reload this page.