Skip to content

BUG: Read_csv won't warn/skip bad lines when nrows is being used #50409

Closed
@d2thebee

Description

@d2thebee

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
data = pd.read_csv('file.csv', sep='¬', encoding = 'ISO-8859-1', engine='python', on_bad_lines='warn', nrows=2845060)

Issue Description

Read_csv with on_bad_lines AND nrows at the same time appears to not follow the correct on_bad_lines logic. When I run the above example without the nrows argument, I get the following warning:

Skipping line 2845058: '¬' expected after '"'

But running with nrows argument included, it errors out with the following message: (same happens when I use on_bad_lines='skip' instead of 'warn')

Error: '¬' expected after '"'

Expected Behavior

The on_bad_lines logic should continue to work when using nrows so I should see the 'Skipping line' warning.

Installed Versions

INSTALLED VERSIONS

commit : 8dab54d
python : 3.10.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 154 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252

pandas : 1.5.2
numpy : 1.23.2
pytz : 2022.1
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
...
xlrd : 2.0.1
xlwt : None
zstandard : None
tzdata : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO CSVread_csv, to_csvNeeds InfoClarification about behavior needed to assess issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions