Skip to content

ENH: PyTables Enhancements for future #2391

Closed
@jreback

Description

@jreback

open (not in any particular order)

  1. add support for other dtypes in table columns (datetime,date,unicode)
  2. Implement variable length strings in a parallel VLArray (and synchronize): Support a VLStringCol PyTables/PyTables#198
  3. revisit Term syntax - can we do better / more readability?
    3a. implement or in Terms (maybe use pyparsing like syntax)
  4. implement WORMTable
  5. one big area is to test whether data columns really are slower; it thus may make sense to make data columns = True the default (but not necessarily index them). see https://groups.google.com/forum/m/?fromgroups#!topic/pydata/cmw1F3OFJSc - see the end of this post for some perf tests, so this is prob not a good idea after all
  6. add export function, to export to different PyTables formats(an easy to read table for R (partially done), and output a GenericTable)
  7. provide better access to columns that are data_columns (as we can directly select them) - see read_column, expand this to the entire table (if possible), allows one to avoid selecting all columns in a table (and then reindexing), this works if columns argument is provided to select or inferred from the where.
  8. add out-of-core computation support (see my comment about 1/2 down in pandas converts int32 to int64 #622), this is partially supported now that we have an iterator (ENH: support iteration on returned results in select and select_as_multiple in HDFStore #3078)
  9. add a method to create a table structure (create_table)?, w/o actually appending, so don't have to add parms in each call to append.
  10. Support a better mechanism for table splitting Splitter? that a user can specify how to split (rather than a dict); then store this object, so can automatically recreate the resulting table (enable for both Storer and Table objects)
  11. Optimize table appending, I think we can do better! (GH PERF: HDFStore table writing performance improvements #3537) makes some improvements
  12. allow itemsize='truncate' to allow subsquent appends to proceed with string truncation (on specific columns)
  13. allow where in select_column, return a properly indexed Series, add option to include the index (use_index=True?)
  14. Better deal with a very long list as input to a Term, but running multiple or sub-queries
  15. Add support for coulumn oriented tables, dep is carray, http://carray.pytables.org/docs/manual/

done

  1. DONE (GH Pytables support for hierarchical keys #2401): access store paths via path notation / dot notation (GH BUG: issue in HDFStore with too many selectors in a where #2755)
  2. DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): add to docs (GH Different HDFStores in multiple threads crashes Python #2397) - issues about reading/writing concurrently in threads/processes
    http://sourceforge.net/mailarchive/message.php?msg_id=30190886
  3. DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): support panelnd (GH Panelnd #2242)
  4. DONE (GH ENH: added support for data column queries #2561): Should DataFrames be automagically indexed on 'index' (prob yes), but then should have a flag in append/put, and enable passing of the indexing options
  5. DONE (GH ENH: ndim tables in HDFStore (allow indexables to be passed in) #2497): Check if create_table_index changes the current index if different options are passed
  6. DONE (GH ENH: added support for data column queries #2561): for writing add chunk keyword to select to provide generator like behavior - each call to return the next chunk of data
  7. DONE (GH ENH: added support for data column queries #2561): support multi indexes on tables
    5a. DONE real dtype integration is coming on PR ENH/BUG/DOC: allow propogation and coexistance of numeric dtypes #2708 (eg even though 0.10.1 will actually read/write float32 columns u can't really do much with them w/o having them upcasted) - in any event I think HDFStore will accommodate this already. but more testing needed
  8. DONE iterator support in select, http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python (GH ENH: support iteration on returned results in select and select_as_multiple in HDFStore #3078)
  9. DONE (GH ENH: HDFStore enhancements #3531) support timezones in datelike columns (index should be ok already) (scott?), (GH PyTables dates don't work when you switch to a different time zone #2852)

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO DataIO issues that don't fit into a more specific labelIO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions