Skip to content

read_csv() ignores na_filter=False for index columns #7518

Closed
@dlenski

Description

@dlenski

Using 0.14.0. pandas.io.parsers.read_csv is supposed to ignore blank-looking values if na_filter=False, but it does not do this for index_col columns.

foo.csv:

fruit,size,sugar
apples,medium,2
pear,medium,3
grape,small,4
durian,,1

The default behavior gives a dataframe with a NaN in place of the empty value from this last row:

df = pd.io.parsers.read_csv("foo.csv")

This gives the same dataframe with a blank string instead of a NaN. So far so good:

df = pd.io.parsers.read_csv("foo.csv", na_filter=False)

My expectation was that this next version would give a dataframe with no NaN values in the index, but it does not:

df = pd.io.parsers.read_csv("foo.csv", index_col=['fruit','size'], na_filter=False)
print df
=>                sugar
   fruit  size         
   apples medium      2
   pear   medium      3
   grape  small       4
   durian NaN         1

Because it unexpectedly includes NaNs, I've been fighting with issue 4862 in unstack for hours :-(.

In order to get the desired behavior, a DF with no NaNs in the index, I have to read the data without a multi-index, then set_index afterwards:

df = pd.io.parsers.read_csv("foo.csv", na_filter=False)
df.set_index(['fruit','size'])

As a temporary fix, perhaps the documentation ought to clarify the behavior of na_filter with respect to index_col.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions