Closed
Description
Code Sample, a copy-pastable example if possible
dataframe = pandas.read_csv(inputFolder + dataFile, chunksize=1000000, na_values='null', usecols=fieldsToKeep, low_memory=False, header=0, sep='\t')
tables = map(lambda table: TimeMe(foo)(table, categoryExceptions), dataframe)
def foo(table, exceptions):
"""
Modifies the columns of the dataframe in place to be categories, largely to save space.
:type table: pandas.DataFrame
:type exceptions: set columns not to modify.
:rtype: pandas.DataFrame
"""
for c in table:
if c in exceptions:
continue
x = table[c]
if str(x.dtype) != 'category':
x.fillna('null', inplace=True)
table[c] = x.astype('category', copy=False)
return table
Problem description
I have a 34 GB tsv file and I've been reading it using pandas readcsv function with chunksize specified as 1000000. The coomand above works fine with a 8 GB file, but pandas crashes for my 34 GB file, subsequently crashing my iPython notebook.