Description
This ticket is an outgrowth of a discussion in pull request #22587
By my rough count, the read_csv
method has nearly 50 keyword arguments.
Of those, 32 arguments are made up or two or more words. Twenty of those multi-word arguments use an underscore to mark the space between words, like skip_blank_lines
and parse_dates
. Twelve do not, like chunksize
and lineterminator
.
It is my opinion this is a small flaw in pandas' API, and that the library would benefit by standardizing how spaces are handled. It would make pandas more legible and consistent, and therefore easier for users of all experience levels.
I have taught pandas to dozens of newbies across the country and I can testify from experience that small variations in the naming style of commonly used methods introduces unnecessary frustration, and
can even reduce user confidence in the quality of the overall product.
As a frequent user of pandas, I can also attest that the inconsistencies require me, someone who uses the library daily, to routinely consult the documentation to ensure I use the proper kwarg naming style.
I am sympathetic to the desire to maintain backwards compatibility, which I believe could be managed with deprecation warnings that, if included, could be temporary, and ultimately removed in a future version, much in the way sort_values
was introduced.
Since the underscore method of handling word breaks is more common and more legible, I propose it be adopted. All existing multi-word arguments without an underscore would need to be modified. You can find an experimental patch of the skiprows
kwargs, and considerable support from other users for pursuing this type of change, in #22587.
If that pull request is ultimately merged, and the maintainers agree with the larger goal I've tried to articulate here, I would be pleased to lead an effort to expand whatever design pattern is agreed upon to other keyword arguments across the library.