Skip to content

Commit 9b0aac0

Browse files
committed
ENH/BUG/DOC: added support for data column queries (can construct searches on the actual columns of the data)
added nan_rep for supporting string columns with nan's in them performance enhancements on string columns more tests & docs for data columns
1 parent 0e3f856 commit 9b0aac0

File tree

6 files changed

+407
-107
lines changed

6 files changed

+407
-107
lines changed

doc/source/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
# add these directories to sys.path here. If the directory is relative to the
1717
# documentation root, use os.path.abspath to make it absolute, like shown here.
1818
#sys.path.append(os.path.abspath('.'))
19+
sys.path.insert(0, '/home/jreback/pandas')
1920
sys.path.insert(0, os.path.abspath('../sphinxext'))
2021

2122
sys.path.extend([

doc/source/io.rst

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1095,7 +1095,7 @@ Storing Mixed Types in a Table
10951095
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10961096

10971097
Storing mixed-dtype data is supported. Strings are store as a fixed-width using the maximum size of the appended column. Subsequent appends will truncate strings at this length.
1098-
Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set a larger minimum for the string columns. Storing ``floats, strings, ints, bools`` are currently supported.
1098+
Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set a larger minimum for the string columns. Storing ``floats, strings, ints, bools`` are currently supported. For string columns, passing ``nan_rep = 'my_nan_rep'`` to append will change the default nan representation on disk (which converts to/from `np.nan`), this defaults to `nan`.
10991099

11001100
.. ipython:: python
11011101
@@ -1115,7 +1115,6 @@ Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set
11151115
11161116
Querying a Table
11171117
~~~~~~~~~~~~~~~~
1118-
11191118
``select`` and ``delete`` operations have an optional criteria that can be specified to select/delete only
11201119
a subset of the data. This allows one to have a very large on-disk table and retrieve only a portion of the data.
11211120

@@ -1160,6 +1159,30 @@ You can create an index for a table with ``create_table_index`` after data is al
11601159
i.optlevel, i.kind
11611160
11621161
1162+
Query via Data Columns
1163+
~~~~~~~~~~~~~~~~~~~~~~
1164+
You can designate (and index) certain columns that you want to be able to perform queries (other than the `indexable` columns, which you can always query). For instance say you want to perform this this common operation, on-disk, and return just the frame that matches this query.
1165+
1166+
.. ipython:: python
1167+
1168+
df['string'] = 'foo'
1169+
df.ix[4:6,'string'] = np.nan
1170+
df.ix[7:9,'string'] = 'bar'
1171+
df
1172+
1173+
# on-disk operations
1174+
store.append('df_dc', df, columns = ['B','string'])
1175+
store.select('df_dc',[ Term('B>0') ])
1176+
1177+
# getting creative
1178+
store.select('df_dc',[ Term('B>0'), Term('string=foo') ])
1179+
1180+
# index the data_column
1181+
store.create_table_index('df_dc', columns = ['B'])
1182+
store.root.df_dc.table
1183+
1184+
There is some performance degredation by making lots of columns into `data columns`, so it is up to the user to designate these.
1185+
11631186
Delete from a Table
11641187
~~~~~~~~~~~~~~~~~~~
11651188
You can delete from a table selectively by specifying a ``where``. In deleting rows, it is important to understand the ``PyTables`` deletes rows by erasing the rows, then **moving** the following data. Thus deleting can potentially be a very expensive operation depending on the orientation of your data. This is especially true in higher dimensional objects (``Panel`` and ``Panel4D``). To get optimal deletion speed, it pays to have the dimension you are deleting be the first of the ``indexables``.

0 commit comments

Comments
 (0)