Skip to content

Support reading random rows in read_csv #14285

Closed
@seven7e

Description

@seven7e

It is very common to read random rows in a large csv file, typically for testing with a small dataset, or fit the limit of memory. The parameter nrows is used for read the first n lines, but I didn't find any feature to read random lines. Such parameter might be named keeprows (opposite to skiprows), which supports:

  • int, e.g. keeprows=100 means keep 100 random lines (uniformly)
  • float in (0, 1), e.g. keeprows=0.05 means keep 5% of total lines
  • list of int(or iterable), e.g. keeprows=[1, 3, 8] mean to keep line 1, 3, and 8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Duplicate ReportDuplicate issue or pull requestIO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions