ENH: create out-of-core processing module

Conceptually create a pipeline processor that performs out-of-core computation.
This is easily parallelizable (multi-core or machines), in theory cython / ipython / joblib / hadoop could operate with this
## requirements

the data set must support chunking, and the function must operate only on that chunk
Useful in cases of a large number of rows, or a problem that you want to parrallelize.
## input

a chunking iterator that reads from disk (could take chunksize parameters,
a handle and just call the iterators as well)
- `read_csv` 
- `HDFStore` 
## function

take an iterated chunk, and an axis and return another pandas object
(could be a reduction, transformation, whatever)
## output

an output mechanism to take the function application, must support appending
- `to_csv` (with appending)
- `HDFStore` (table)
- another pipeline
- in memory

Map-reduce is an example of this type of pipelining.

Interesting Library
https://github.com/enthought/distarray


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: create out-of-core processing module #3202

requirements

input

function

output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: create out-of-core processing module #3202

Description

requirements

input

function

output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions