Skip to content

read_csv(engine='c') can insert spurious rows full of NaNs #10022

Closed
@jblackburne

Description

@jblackburne

I have a well-formed CSV file with about 70k lines that parses out to a DataFrame with about 170k rows, where the extra rows are just full of NaNs. It only happens with the 'c' engine.

This is with git master.

I won't bother to upload the CSV file in question because I think I've tracked down the problem. It seems that tokenize_delimited() runs on chunks of data at a time. This problem occurs when a chunk happens to start with '\n'. I'll send a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csv

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions