Closed
Description
open (not in any particular order)
- add support for other dtypes in table columns (datetime,date,unicode)
- Implement variable length strings in a parallel VLArray (and synchronize): Support a VLStringCol PyTables/PyTables#198
- revisit Term syntax - can we do better / more readability?
3a. implementor
in Terms (maybe use pyparsing like syntax) - implement WORMTable
- one big area is to test whether data columns really are slower; it thus may make sense to make data columns = True the default (but not necessarily index them). see https://groups.google.com/forum/m/?fromgroups#!topic/pydata/cmw1F3OFJSc - see the end of this post for some perf tests, so this is prob not a good idea after all
- add
export
function, to export to different PyTables formats(an easy to read table for R (partially done), and output a GenericTable) - provide better access to columns that are data_columns (as we can directly select them) - see
read_column
, expand this to the entire table (if possible), allows one to avoid selecting all columns in a table (and then reindexing), this works ifcolumns
argument is provided to select or inferred from the where. - add out-of-core computation support (see my comment about 1/2 down in pandas converts int32 to int64 #622), this is partially supported now that we have an iterator (ENH: support iteration on returned results in select and select_as_multiple in HDFStore #3078)
- add a method to create a table structure (
create_table
)?, w/o actually appending, so don't have to add parms in each call to append. - Support a better mechanism for table splitting
Splitter
? that a user can specify how to split (rather than a dict); then store this object, so can automatically recreate the resulting table (enable for both Storer and Table objects) - Optimize table appending, I think we can do better! (GH PERF: HDFStore table writing performance improvements #3537) makes some improvements
- allow
itemsize='truncate'
to allow subsquent appends to proceed with string truncation (on specific columns) - allow where in
select_column
, return a properly indexed Series, add option to include the index (use_index=True?
) - Better deal with a very long list as input to a
Term
, but running multipleor
sub-queries - Add support for coulumn oriented tables, dep is
carray
, http://carray.pytables.org/docs/manual/
done
- DONE (GH Pytables support for hierarchical keys #2401): access store paths via path notation / dot notation (GH BUG: issue in HDFStore with too many selectors in a where #2755)
- DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): add to docs (GH Different HDFStores in multiple threads crashes Python #2397) - issues about reading/writing concurrently in threads/processes
http://sourceforge.net/mailarchive/message.php?msg_id=30190886 - DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): support panelnd (GH Panelnd #2242)
- DONE (GH ENH: added support for data column queries #2561): Should DataFrames be automagically indexed on 'index' (prob yes), but then should have a flag in append/put, and enable passing of the indexing options
- DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): Check if create_table_index changes the current index if different options are passed
- DONE (GH ENH: added support for data column queries #2561): for writing add chunk keyword to select to provide generator like behavior - each call to return the next chunk of data
- DONE (GH ENH: added support for data column queries #2561): support multi indexes on tables
5a. DONE real dtype integration is coming on PR ENH/BUG/DOC: allow propogation and coexistance of numeric dtypes #2708 (eg even though 0.10.1 will actually read/write float32 columns u can't really do much with them w/o having them upcasted) - in any event I think HDFStore will accommodate this already. but more testing needed - DONE iterator support in
select
, http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python (GH ENH: support iteration on returned results in select and select_as_multiple in HDFStore #3078) - DONE (GH ENH: HDFStore enhancements #3531) support timezones in datelike columns (index should be ok already) (scott?), (GH PyTables dates don't work when you switch to a different time zone #2852)