Skip to content

Cannot pickle parser functions? #22265

Closed
@smcveigh-phunware

Description

@smcveigh-phunware

Code Sample

import pandas as pd
import concurrent.futures as futs
from pathlib import Path

paths = Path('data').rglob('*.csv')

with futs.ProcessPoolExecutor() as e:
    futures = {e.submit(pd.read_csv, p, index_col='id'): p for p in paths}
done, _ = futs.wait(f)
for d in done:
    r = f.result()

Problem description

Pandas < v0.23 used to be very efficient to read CSVs by dispatching to concurrent.futures.ProcessPoolExecuter which could reduce loading large #s of CSV files from 10s of mins to mins.

It appears this is no longer possible since the parser functions are no longer picklable. Is there a reason for this change?

Expected Output

No error, but raises:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/queues.py", line 234, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object '_make_parser_function.<locals>.parser_f'

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.7.0
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions