Skip to content

ENH: Add argument "multiprocessing" to pd.read_csv() method #37955

Closed
@nf78

Description

@nf78

Is your feature request related to a problem?

When the method pd.read_csv() is called, unfortunately this doesn't take advantage of the multiprocessing module, making it inefficient to read multiple datasets, especially when more cores are available to work.

Other modules like modin or dask, already implement this, but I think that pandas should implement by itself, if called for.

Describe the solution you'd like

It should be able to work with the multiprocessing module out of the box, as an initial enhancement, and then in the future support other possible backends like joblib.

A list of filenames should be passed:

An example of application would be:

pd.read_csv(list_of_filenames, multiprocessing=True)

pd.read_csv(glob.glob('table_*.csv'), multiprocessing=True)

API breaking implications

This should not change established behavior, considering that the default value for the "multiprocessing" argument should be "None" by default.

The memory consumption should be the same, it just consumes the memory much faster.

For this method option, the indices will be the same from each file, but likely to be in different order, but the user can reset_index() afterwards if needed.

Describe alternatives you've considered

[this should provide a description of any alternative solutions or features you've considered]

Additional context

I have also considered extra backend options for future enhancements of this implementation, like joblib, ray, dask.

#NOTE: I have already a proof-of-concept for the solution, so I can work a bit further on it, commit and make a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions