Closed
Description
It is very common to read random rows in a large csv file, typically for testing with a small dataset, or fit the limit of memory. The parameter nrows
is used for read the first n lines, but I didn't find any feature to read random lines. Such parameter might be named keeprows
(opposite to skiprows
), which supports:
- int, e.g.
keeprows=100
means keep 100 random lines (uniformly) - float in (0, 1), e.g.
keeprows=0.05
means keep 5% of total lines - list of int(or iterable), e.g.
keeprows=[1, 3, 8]
mean to keep line 1, 3, and 8