Closed
Description
Minor issue regarding read_csv
's na_values
argument in dict
format. I note that the list format works fine when the NA value is given as a float-type (which is often the intuitive choice), e.g.:
co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt",
comment = "#", delim_whitespace = True,
names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
na_values =[-99.99, -1])
However, the dict format is more appropriate for this classic data set, since different columns are defining different NA values. Unfortunately, this fails with an error about float type:
co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt",
comment = "#", delim_whitespace = True,
names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
na_values = {"decimal_date" : -99.99, "days" : -1})
and the NA value must be given as a string; which feels all kinds of wrong here:
co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt",
comment = "#", delim_whitespace = True,
names = ["year", "month", "decimal_date", "average", "interpolated", "trend", "days"],
na_values = {"decimal_date" : "-99.99", "days" : "-1"})
Thanks for all the pandas
awesomeness,