Skip to content

read_fwf() doesn't work properly when both skiprows and iterator options are used. #10261

Closed
@arenius

Description

@arenius

When read_fwf is used with iterator = True and skiprows = [list] arguments it doesn't properly skip all the rows in the skiprows list. Things work properly when either of those arguments is used in isolation.

Here is a simple bit of code to reproduce:

import pandas as pd

#Create a fixed width file to test with.
df = pd.DataFrame({'a': range(10)})
with open('testfwf.txt', 'w') as f:
    f.write(df.to_string(index = False, header = False))

rows_to_skip = [0,1,2,6,9]

df_iter = pd.read_fwf('testfwf.txt', colspecs = [(0,2)], names = ['a'], iterator = True,
                      chunksize = 2, skiprows = rows_to_skip)

print('The fixed width file in chunks with rows [0,1,2,6,9] skipped: ')
for df in df_iter:
    print(df)

print('Notice how row 6 of the fixed width file has not been skipped even though it should')
print('have been.')

It seems that all rows are skipped until there are rows that aren't skipped. For example, the leading rows 0,1,2 are skipped. But since there are then rows that aren't skipped the skipping stops for all rows until then end, when row 9 IS skipped.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions