pandas-dev
diff --git a/‎doc/source/conf.py
Lines changed: 1 addition & 0 deletions b/‎doc/source/conf.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/source/io.rst
Lines changed: 25 additions & 2 deletions b/‎doc/source/io.rst
Lines changed: 25 additions & 2 deletions
@@ -16,6 +16,7 @@
 # add these directories to sys.path here. If the directory is relative to the
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 #sys.path.append(os.path.abspath('.'))
+sys.path.insert(0, '/home/jreback/pandas')
 sys.path.insert(0, os.path.abspath('../sphinxext'))
 
 sys.path.extend([
 
@@ -1095,7 +1095,7 @@ Storing Mixed Types in a Table
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Storing mixed-dtype data is supported. Strings are store as a fixed-width using the maximum size of the appended column. Subsequent appends will truncate strings at this length.
-Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set a larger minimum for the string columns. Storing ``floats, strings, ints, bools`` are currently supported.
+Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set a larger minimum for the string columns. Storing ``floats, strings, ints, bools`` are currently supported. For string columns, passing ``nan_rep = 'my_nan_rep'`` to append will change the default nan representation on disk (which converts to/from `np.nan`), this defaults to `nan`.
 
 .. ipython:: python
 
@@ -1115,7 +1115,6 @@ Passing ``min_itemsize = { `values` : size }`` as a parameter to append will set
 
 Querying a Table
 ~~~~~~~~~~~~~~~~
-
 ``select`` and ``delete`` operations have an optional criteria that can be specified to select/delete only
 a subset of the data. This allows one to have a very large on-disk table and retrieve only a portion of the data.
 
@@ -1160,6 +1159,30 @@ You can create an index for a table with ``create_table_index`` after data is al
    i.optlevel, i.kind
 
 
+Query via Data Columns
+~~~~~~~~~~~~~~~~~~~~~~
+You can designate (and index) certain columns that you want to be able to perform queries (other than the `indexable` columns, which you can always query). For instance say you want to perform this this common operation, on-disk, and return just the frame that matches this query.
+
+.. ipython:: python
+
+   df['string'] = 'foo'
+   df.ix[4:6,'string'] = np.nan
+   df.ix[7:9,'string'] = 'bar'
+   df
+
+   # on-disk operations
+   store.append('df_dc', df, columns = ['B','string'])
+   store.select('df_dc',[ Term('B>0') ])
+
+   # getting creative
+   store.select('df_dc',[ Term('B>0'), Term('string=foo') ])
+
+   # index the data_column
+   store.create_table_index('df_dc', columns = ['B'])
+   store.root.df_dc.table
+
+There is some performance degredation by making lots of columns into `data columns`, so it is up to the user to designate these.
+
 Delete from a Table
 ~~~~~~~~~~~~~~~~~~~
 You can delete from a table selectively by specifying a ``where``. In deleting rows, it is important to understand the ``PyTables`` deletes rows by erasing the rows, then **moving** the following data. Thus deleting can potentially be a very expensive operation depending on the orientation of your data. This is especially true in higher dimensional objects (``Panel`` and ``Panel4D``). To get optimal deletion speed, it pays to have the dimension you are deleting be the first of the ``indexables``.