Skip to content

implement Dataset API #1017

Open
Open
@gdementen

Description

@gdementen

Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.

Dataset API:

  • __init__(connect_string, max_memory=None, **kwargs) -- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.
  • open(**kwargs) -- open/connect to the underlying storage. Kwargs here override those passed in __init__. Normally called via __enter__.
  • __enter__ and __exit__ (to be usable as a context manager)
  • read(key=None) -- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe __getitem__, with an optional load() is enough.
  • load(key=None) -- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.
  • open_key(key=None) -- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.
  • __getattr__ -> forwards to __getitem__
  • __getitem__(key) -> equivalent to load(key) if not loaded yet and return the array (or use open_key(key) instead???)
  • __setitem__(key) -> add or change an existing value.
  • close() -- close file/connection to underlying storage. Normally called via __exit__

Misc thoughts:

  • I think excel.Workbook should be a subclass of Dataset
  • We could/should also implement a generic "read" top-level function which would open a dataset, read the array and close it, to replace/complement the read_* functions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions