Description
There are several places where pandas has hidden heuristics/thresholds dictating certain behavior that is not immediately obvious or configurable to the user. IIRC, there have been bugs in rolling
and to_datetime
where buggy behavior was encountered when data had a particular value or the data was a certain size for example which can be hard to diagnose.
Ideally we should:
- Not change behavior due to some data characteristic introspection
- At lease expose the option to the user to control the heuristic
CSV reading tokenizer chunksize
pandas/pandas/_libs/parsers.pyx
Line 119 in bb0403b
CSV line buffer size
pandas/pandas/_libs/parsers.pyx
Line 587 in bb0403b
Number of elements when to auto use numexpr
TDA iter chunk size processing
pandas/pandas/core/arrays/timedeltas.py
Line 387 in bb0403b
Something pytables related
pandas/pandas/core/computation/pytables.py
Line 101 in bb0403b
Line 1887 in bb0403b
Number of element to automatically use caching in to_datetime
pandas/pandas/core/tools/datetimes.py
Line 124 in bb0403b
Chunk size to use when writing csv
pandas/pandas/io/formats/csvs.py
Line 166 in bb0403b
Number of regexes to store when time parsing
pandas/pandas/_libs/tslibs/strptime.pyx
Line 576 in bb0403b
Rank tolerance
Line 61 in bb0403b
isin algo determination
pandas/pandas/core/algorithms.py
Line 521 in bb0403b
Value formatting
pandas/pandas/io/formats/format.py
Line 1562 in bb0403b
Number of elements to populate hash table
Line 99 in bb0403b