Inconsistent Handling of na_values and converters in read_csv

On `master` (commit <a href="https://github.com/pydata/pandas/commit/40b4bb4bb2a7018ed08025c5c93cd1080a0b5f7f">40b4bb4</a>):

``` python
>>> data = """A
1
CAT
3"""
>>> f = lambda x: x
>>> read_csv(StringIO(data), na_values='CAT', converters={'A': f}, engine='c')
     A
0    1
1  CAT
2    3
>>> read_csv(StringIO(data), na_values='CAT', converters={'A': f}, engine='python')
     A
0    1
1  NaN
2    3
```

I expect both to give the same output, though I believe the Python output is more correct because it respects `na_values` unlike the C engine.   I thought the simple fix would be to remove the `continue` statement <a href="https://github.com/pydata/pandas/blob/master/pandas/parser.pyx#L1008">here</a>, but that causes test failures, so probably a more involved refactoring might be needed to align the order of converter application, `NaN` value conversion, and `dtype` conversion.

IMO this should be added to #12686, as this is a difference in behaviour between the two engines.

xref #5232


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistent Handling of na_values and converters in read_csv #13302

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inconsistent Handling of na_values and converters in read_csv #13302

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions