Skip to content

Inconsistent Handling of na_values and converters in read_csv #13302

Open
@gfyoung

Description

@gfyoung

On master (commit 40b4bb4):

>>> data = """A
1
CAT
3"""
>>> f = lambda x: x
>>> read_csv(StringIO(data), na_values='CAT', converters={'A': f}, engine='c')
     A
0    1
1  CAT
2    3
>>> read_csv(StringIO(data), na_values='CAT', converters={'A': f}, engine='python')
     A
0    1
1  NaN
2    3

I expect both to give the same output, though I believe the Python output is more correct because it respects na_values unlike the C engine. I thought the simple fix would be to remove the continue statement here, but that causes test failures, so probably a more involved refactoring might be needed to align the order of converter application, NaN value conversion, and dtype conversion.

IMO this should be added to #12686, as this is a difference in behaviour between the two engines.

xref #5232

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions