API: Avoid Hidden numeric heuristics

There are several places where pandas has hidden heuristics/thresholds dictating certain behavior that is not immediately obvious or configurable to the user. IIRC, there have been bugs in `rolling` and `to_datetime` where buggy behavior was encountered when data had a particular value or the data was a certain size for example which can be hard to diagnose.

Ideally we should:

1. Not change behavior due to some data characteristic introspection
2. At lease expose the option to the user to control the heuristic


CSV reading tokenizer chunksize
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/_libs/parsers.pyx#L119

CSV line buffer size
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/_libs/parsers.pyx#L587

Number of elements when to auto use numexpr
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/core/computation/expressions.py#L42

TDA iter chunk size processing
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/core/arrays/timedeltas.py#L387

Something pytables related
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/core/computation/pytables.py#L101
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/io/pytables.py#L1887

Number of element to automatically use caching in to_datetime
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/core/tools/datetimes.py#L124

Chunk size to use when writing csv
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/io/formats/csvs.py#L166

Number of regexes to store when time parsing
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/_libs/tslibs/strptime.pyx#L576

Rank tolerance
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/_libs/algos.pyx#L61

isin algo determination
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/core/algorithms.py#L521

Value formatting
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/io/formats/format.py#L1562

Number of elements to populate hash table
https://github.com/pandas-dev/pandas/blob/bb0403b25b1935a608b324a93a483bd22e6c43d3/pandas/_libs/index.pyx#L99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Avoid Hidden numeric heuristics #53781

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: Avoid Hidden numeric heuristics #53781

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions