Description
Using 0.14.0. pandas.io.parsers.read_csv
is supposed to ignore blank-looking values if na_filter=False
, but it does not do this for index_col
columns.
foo.csv:
fruit,size,sugar
apples,medium,2
pear,medium,3
grape,small,4
durian,,1
The default behavior gives a dataframe with a NaN in place of the empty value from this last row:
df = pd.io.parsers.read_csv("foo.csv")
This gives the same dataframe with a blank string instead of a NaN. So far so good:
df = pd.io.parsers.read_csv("foo.csv", na_filter=False)
My expectation was that this next version would give a dataframe with no NaN values in the index, but it does not:
df = pd.io.parsers.read_csv("foo.csv", index_col=['fruit','size'], na_filter=False)
print df
=> sugar
fruit size
apples medium 2
pear medium 3
grape small 4
durian NaN 1
Because it unexpectedly includes NaNs, I've been fighting with issue 4862 in unstack
for hours :-(.
In order to get the desired behavior, a DF with no NaNs in the index, I have to read the data without a multi-index, then set_index
afterwards:
df = pd.io.parsers.read_csv("foo.csv", na_filter=False)
df.set_index(['fruit','size'])
As a temporary fix, perhaps the documentation ought to clarify the behavior of na_filter
with respect to index_col
.