DOC: discrepancies in read_csv docstring between docstring guide or type hints

### Pandas version checks

- [X] I have checked that the issue still exists on the latest versions of the docs on `main` [here](https://pandas.pydata.org/docs/dev/)


### Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

### Documentation problem

I have been reviewing the docstring for `read_csv` to add some clarifying edits in a PR and encountered some questions while reviewing the parameter descriptions. I found there were a few cases where the parameter types summary did not appear to align with the docstring specifications, as well as a few cases where the description of the types seems to differ from the type hints in the function signature. 

I have made an attempt at describing those discrepancies here and have added a few questions in as well just to get clarification on how things are done. Once I get things figured out here, I'll plan to make any appropriate corrections to the docstring in a PR.

# Parameters potentially needing edits

## sep:

* **type hint**:   `str | None | lib.NoDefault = lib.no_default`
* **description**: *str, default ','*

Why is the default not just `','` rather than `lib.no_default`?


## delimiter:

* **type hint**:   `str | None | lib.NoDefault = None`
* **description**: *str, default None*

It appears that *default None* should be replaced with *str, optional*

Also, I noticed for most parameters if the type hint includes `lib.NoDefault` then the default value is `lib.no_default` but this one has `None` instead. What is the reason this one is different? (And I have more questions about the `no_default` flag later on.)


## header:

* **type hint**:   `int | Sequence[int] | None | Literal["infer"] = "infer"`
* **description**: *int, list of int, None, default 'infer'*

I would expect *'infer'* to occur in the list of options before being highlighted as the default, like so:
*int, list of int, 'infer' or None, default 'infer'*


## names:

* **type hint**:   `Sequence[Hashable] | None | lib.NoDefault = lib.no_default`
* **description**: *array-like, optional*

Is there a reason it is described as *array-like*? It seems to only need to be a sequence of unique names which would be more like an ordered set, or for simplicity maybe just *list-like*? Or should it really be *list of Hashable* to be specific? Or even better *Sequence of Hashable*?


## index_col:

* **type hint**:   `IndexLabel | Literal[False] | None = None`
* **description**: *int, str, sequence of int / str, or False, optional, default None*

it appears the *default None* is redundant since *optional* is already specified, so I assume this can be removed, is that correct? Something like:
*int, str, list of int, list of str or False, optional*

Also, I notice that *list* is used in the descriptions rather than *sequence* in most cases although*sequence* might be more accurate and descriptive. I'm wondering if there's a reason for that being used less often. This seems to be the only parameter that employs *sequence* in its description.


## usecols:

* **type hint**:   `list[HashableT] | Callable[[Hashable], bool] | None = None`
* **description**: *list-like or callable, optional*

Since the order of this parameter does not affect the result, and presumably only unique values are allowed, I am wondering if it would be clearer to describe it as *set-like*? *List-like* still seems appropriate but just wondering if it could be more specific to highlight the idea that order will not be preserved.

## dtype:

* **type hint**:   `DtypeArg | None = None`
* **description**: *Type name or dict of column -> type, optional*

From the docstring guide, it looks like the description of a dict should refer to its data types, not necessarily their meaning. And it says to use *dict of {key : value}* so would something like this be more appropriate?
*type, Hashable or dict of {Hashable : type or str}, optional*


## engine:

* **type hint**:   `CSVEngine | None = None`
* **description**: *{'c', 'python', 'pyarrow'}, optional*

This description does not appear to specify the behavior when `None` is passed. Does it default to the 'c' engine? It seems like older versions used to say this, but I don't see it explicitly described in the current version. And if so, would it then make sense to define the type hint as folllow?
`CSVEngine = 'c'`


## converters:

* **type hint**:   `Mapping[Hashable, Callable] | None = None`
* **description**: *dict, optional*

The docstring guide suggest this description should be:
*dict of {Hashable : Callable}*



## skiprows:

* **type hint**:   `list[int] | int | Callable[[Hashable], bool] | None = None`
* **description**: *list-like, int or callable, optional*

This could be made more specific by changing to::
*list of int, int or Callable, optional*


## na_values:

* **type hint**:   `Sequence[str] | Mapping[str, Sequence[str]] | None = None`
* **description**: *scalar, str, list-like, or dict, optional*

The mention of *scalar* as a valid type is not clear to me; what would that be referring to? Also, it says that *str* is a valid input; however, the type hint does not include that as an option. What should be changed to resolve this?


## parse_dates:

* **type hint**:   `bool | Sequence[Hashable] | None = None`
* **description**: *bool or list of int or names or list of lists or dict, default False*

This also appears to not align with the type hint which says you can only assign either a bool or a list. Also the default seems to actually be `None`, yet the description says it's `False`. Looking for clarification here.


## infer_datetime_format:

* **type hint**:   `bool | lib.NoDefault = lib.no_default`
* **description**: *bool, default False*

It appears that the default is technically `lib.no_default` but descibed to be `False`. What is the reason for doing this rather than assigning its default as `False`, such as how `keep_date_col` is defined?


## date_parser:

* **type hint**:   `Callable | lib.NoDefault = lib.no_default`
* **description**: *function, optional*

Could be updated to use a specific type, i.e.:
*Callable, optional*

Also, what is the reason for using `lib.no_default` instead of just `None`?


## date_format:

* **type hint**:   `str | None = None`
* **description**: *str or dict of column -> format, default None*

The type hint does not appear to match the description here as there is no mention of `dict` being a valid argument. Is the description out of date or vice versa?

If the description is accurate, then Similar to `dtype`, this could be updated as:
*str or dict of {Hashable : str}, optional*


## compression:

* **type hint**:   `CompressionOptions = "infer"`
* **description**: *str or dict, default 'infer'*

Could the `dict` in the description be updated as `dict of {k : v}`? If so, what would the valid types be for k and v, as I was not totally clear about it from the description.


## thousands:

* **type hint**:   `str | None = None`
* **description**: *str, optional*

I assume this should be a *str (length 1)* like some other parameters. Should that be updated?


## decimal:

* **type hint**:   `str = "."`
* **description**: *str, default '.'*

I assume this should be a *str (length 1)* like some other parameters. Should that be updated?


## quoting:

* **type hint**:   `int = csv.QUOTE_MINIMAL`
* **description**: *int or csv.QUOTE_\* instance, default 0*

It would make sense to add some context to the default here, maybe something like:
*int or csv.QUOTE_\* instance, default 0 meaning csv.QUOTE_MINIMAL*



## comment:

* **type hint**:   `str | None = None`
* **description**: *str, optional*

The description says this should be a single character so it should be changed to:
*str (length 1), optional*

## encoding:

* **type hint**:   `str | None = None`
* **description**: *str, optional, default "utf-8"*


The default is technically `None` but the description says *optional, default 'utf-8'*. Is there a reason the default is not just 'utf-8' instead of None?


## encoding_errors:

* **type hint**:   `str | None = "strict"`
* **description**: *str, optional, default "strict"*

I am wondering why this is both optional and has a default. What is the behavior when `None` is passed?



## on_bad_lines:

* **type hint**:   `str = "error"`
* **description**: *{'error', 'warn', 'skip'} or callable, default 'error'*


The type hint does not mention Callable as an option as described in the docstring. Does this need to be updated?


## float_precision:

* **type hint**:   `Literal["high", "legacy"] | None = None`
* **description**: *str, optional*

It looks like the description should not be the generic `str`, but rather:
*{'high', 'legacy'}, optional*

## dtype_backend:

* **type hint**:   `DtypeBackend | lib.NoDefault = lib.no_default`
* **description**: *{"numpy_nullable", "pyarrow"}, defaults to NumPy backed DataFrames*

Given that this defaults to NumPy backed DataFrames, why is `lib.no_default` used rather than just `'numpy_nullable'*?


# General Questions:

Here is a summary of the most common questions that came up while reviewing the parameters:

* Why are some 'optional' parameters not assigned `None` and instead use `lib.no_default` (e.g., `delimiter`, `index_col`, `date_format`)?
* Why are some values with defined default behavior not assigned a literal that would trigger that behaviour but instead use `None` or `lib.no_default` as a flag (e.g., `encoding`, `dtype_backend`)?
* What is the appropriate way to describe a 'list-like' object? Is there a best practice for using 'list of <type>' vs. 'list-like' vs. 'array-like' vs. 'sequence`?


### Suggested fix for documentation

My suggested fix would be to create a PR where I update each parameter description as described in the Documentation Problem section such that the descriptions align with both the docstring guide and the function signature's type hints in every case.

Uh oh!

DOC: discrepancies in read_csv docstring between docstring guide or type hints #53763

Description

Pandas version checks

Location of the documentation

Documentation problem

Parameters potentially needing edits

sep:

delimiter:

header:

names:

index_col:

usecols:

dtype:

engine:

converters:

skiprows:

na_values:

parse_dates:

infer_datetime_format:

date_parser:

date_format:

compression:

thousands:

decimal:

quoting:

comment:

encoding:

encoding_errors:

on_bad_lines:

float_precision:

dtype_backend:

General Questions:

Suggested fix for documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions