Closed
Description
When read_fwf is used with iterator = True and skiprows = [list] arguments it doesn't properly skip all the rows in the skiprows list. Things work properly when either of those arguments is used in isolation.
Here is a simple bit of code to reproduce:
import pandas as pd
#Create a fixed width file to test with.
df = pd.DataFrame({'a': range(10)})
with open('testfwf.txt', 'w') as f:
f.write(df.to_string(index = False, header = False))
rows_to_skip = [0,1,2,6,9]
df_iter = pd.read_fwf('testfwf.txt', colspecs = [(0,2)], names = ['a'], iterator = True,
chunksize = 2, skiprows = rows_to_skip)
print('The fixed width file in chunks with rows [0,1,2,6,9] skipped: ')
for df in df_iter:
print(df)
print('Notice how row 6 of the fixed width file has not been skipped even though it should')
print('have been.')
It seems that all rows are skipped until there are rows that aren't skipped. For example, the leading rows 0,1,2 are skipped. But since there are then rows that aren't skipped the skipping stops for all rows until then end, when row 9 IS skipped.