implement Feather-based format for arrays

### Old HDF format
```python
>>> %timeit Session('demo.h5')
2.09 s
```

### Faster/current HDF format
```python
>>> %timeit Session('demo_fast.h5')
1.25 s
```

### Pure Pandas 

This gives an approximate lower bound of what we could achieve via #724 -- maybe Pandas does a bit too much but I doubt we would get below 500ms

```python
>>> import pandas as pd
>>> sto = pd.HDFStore('demo_fast.h5')
>>> %timeit {k: sto[k] for k in sto.keys()}
781 ms
```

### My working proof of concept for a format based on Feather files & PyArrow

This is 8x as fast as the current best format and at least 3x as fast as what I **think** we could achieve using raw PyTables (as of now (*)).
```python
>>> %timeit Session('demo4.laf')
152 ms
>>> Session('demo4.laf').equals(Session('demo_fast.h5'))
True
```

(*) There is some in-progress projet to use a new HDF mechanism in PyTables to provide (much) faster I/O but this still a WIP and there is no guarantee it will be completed & integrated "soon" (the project is supposed to end by the end of the year).

https://groups.google.com/g/pytables-dev/c/8Y95Us1bJNo


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

implement Feather-based format for arrays #1016

Old HDF format

Faster/current HDF format

Pure Pandas

My working proof of concept for a format based on Feather files & PyArrow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

implement Feather-based format for arrays #1016

Description

Old HDF format

Faster/current HDF format

Pure Pandas

My working proof of concept for a format based on Feather files & PyArrow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions