Closed
Description
I am using anaconda and my pandas version is 0.23.1. When dealing with single large file, setting chunksize or iterator=True works fine and memory usage is low. The problem raises when I am trying to dealing with 5000+ files (file names are in filelist
):
trajectory = [pd.read_csv(f, delim_whitespace=True, header=None, chunksize=10000) for f in filelist]
The memory usage raises very soon and exceeds 20GB+ quickly. However, trajectory = [open(f, 'r')....]
and reading 10000 lines from each file works fine.
I also tried low_memory=True
option but it's not working. Both engine='python'
and memory_map=<some file>
options solve the memory problem but when I use the datas with
X = np.asarray([f.get_chunk().values for f in trajectory])
FX = np.fft.fft(X, axis=0)
The multi-threading of MKL-FFT does not work anymore.