Open
Description
Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.
Dataset API:
__init__(connect_string, max_memory=None, **kwargs)
-- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.open(**kwargs)
-- open/connect to the underlying storage. Kwargs here override those passed in__init__
. Normally called via__enter__
.__enter__
and__exit__
(to be usable as a context manager)read(key=None)
-- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe__getitem__
, with an optionalload()
is enough.load(key=None)
-- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.open_key(key=None)
-- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.__getattr__
-> forwards to__getitem__
__getitem__(key)
-> equivalent toload(key)
if not loaded yet and return the array (or use open_key(key) instead???)__setitem__(key)
-> add or change an existing value.close()
-- close file/connection to underlying storage. Normally called via__exit__
Misc thoughts:
- I think excel.Workbook should be a subclass of Dataset
- We could/should also implement a generic "read" top-level function which would open a dataset, read the array and close it, to replace/complement the read_* functions.