Skip to content

DOC: more prominent HDFStore store docs about storer/table formats #4206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 29 additions & 10 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1651,11 +1651,6 @@ Closing a Store, Context Manager
import os
os.remove('store.h5')


These stores are **not** appendable once written (though you can simply
remove them and rewrite). Nor are they **queryable**; they must be
retrieved in their entirety.

Read/Write API
~~~~~~~~~~~~~~

Expand All @@ -1674,10 +1669,33 @@ similar to how ``read_csv`` and ``to_csv`` work. (new in 0.11.0)

os.remove('store_tl.h5')

.. _io.hdf5-storer:

Storer Format
~~~~~~~~~~~~~

The examples above show storing using ``put``, which write the HDF5 to ``PyTables`` in a fixed array format, called
the ``storer`` format. These types of stores are are **not** appendable once written (though you can simply
remove them and rewrite). Nor are they **queryable**; they must be
retrieved in their entirety. These offer very fast writing and slightly faster reading than ``table`` stores.

.. warning::

A ``storer`` format will raise a ``TypeError`` if you try to retrieve using a ``where`` .

.. code-block:: python

DataFrame(randn(10,2)).to_hdf('test_storer.h5','df')

pd.read_hdf('test_storer.h5','df',where='index>5')
TypeError: cannot pass a where specification when reading a non-table
this store must be selected in its entirety


.. _io.hdf5-table:

Storing in Table format
~~~~~~~~~~~~~~~~~~~~~~~
Table Format
~~~~~~~~~~~~

``HDFStore`` supports another ``PyTables`` format on disk, the ``table``
format. Conceptually a ``table`` is shaped very much like a DataFrame,
Expand Down Expand Up @@ -1708,6 +1726,10 @@ supported.
# the type of stored data
store.root.df._v_attrs.pandas_type

.. note::

You can also create a ``table`` by passing ``table=True`` to a ``put`` operation.

.. _io.hdf5-keys:

Hierarchical Keys
Expand Down Expand Up @@ -2121,9 +2143,6 @@ Notes & Caveats
in a string, or a ``NaT`` in a datetime-like column counts as having
a value), then those rows **WILL BE DROPPED IMPLICITLY**. This limitation
*may* be addressed in the future.
- You can not append/select/delete to a non-table (table creation is
determined on the first append, or by passing ``table=True`` in a
put operation)
- ``HDFStore`` is **not-threadsafe for writing**. The underlying
``PyTables`` only supports concurrent reads (via threading or
processes). If you need reading and writing *at the same time*, you
Expand Down
2 changes: 1 addition & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ pandas 0.12
- When removing an object, ``remove(key)`` raises
``KeyError`` if the key is not a valid store object.
- raise a ``TypeError`` on passing ``where`` or ``columns``
to select with a Storer; these are invalid parameters at this time
to select with a Storer; these are invalid parameters at this time (:issue:`4189`)
- can now specify an ``encoding`` option to ``append/put``
to enable alternate encodings (:issue:`3750`)
- enable support for ``iterator/chunksize`` with ``read_hdf``
Expand Down
6 changes: 4 additions & 2 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -1746,9 +1746,11 @@ def f(values, freq=None, tz=None):

def validate_read(self, kwargs):
if kwargs.get('columns') is not None:
raise TypeError("cannot pass a column specification when reading a Storer")
raise TypeError("cannot pass a column specification when reading a non-table "
"this store must be selected in its entirety")
if kwargs.get('where') is not None:
raise TypeError("cannot pass a where specification when reading a Storer")
raise TypeError("cannot pass a where specification when reading from a non-table "
"this store must be selected in its entirety")

@property
def is_exists(self):
Expand Down