Skip to content

read_csv's na_values dict format cannot parse float type #12224

Closed
@cboettig

Description

@cboettig

Minor issue regarding read_csv's na_values argument in dict format. I note that the list format works fine when the NA value is given as a float-type (which is often the intuitive choice), e.g.:

co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", 
                 comment = "#", delim_whitespace = True,
                names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
                na_values =[-99.99, -1])

However, the dict format is more appropriate for this classic data set, since different columns are defining different NA values. Unfortunately, this fails with an error about float type:

co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", 
                 comment = "#", delim_whitespace = True,
                names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
                na_values = {"decimal_date" : -99.99, "days" : -1})

and the NA value must be given as a string; which feels all kinds of wrong here:

co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt", 
                 comment = "#", delim_whitespace = True,
                names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
                na_values = {"decimal_date" : "-99.99", "days" : "-1"})

Thanks for all the pandas awesomeness,

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO CSVread_csv, to_csvMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions