Skip to content

column-specific min_itemsize doesn't work with append_to_multiple  #11238

Closed
@BrenBarn

Description

@BrenBarn

The HDFStore.append_to_multiple passes on its entire min_itemsize argument to every sub-append. Because not all columns are in every append, it fails when it tries to set a min_itemsize for a certain column when appending to a table that doesn't use that column.

Simple example:

>>> store.append_to_multiple({
...     'index': ["IX"],
...     'nums': ["Num", "BigNum", "RandNum"],
...     "strs": ["Str", "LongStr"]
... }, d.iloc[[0]], 'index', min_itemsize={"Str": 10, "LongStr": 100})
Traceback (most recent call last):
  File "<pyshell#52>", line 5, in <module>
    }, d.iloc[[0]], 'index', min_itemsize={"Str": 10, "LongStr": 100})
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\io\pytables.py", line 1002, in append_to_multiple
    self.append(k, val, data_columns=dc, **kwargs)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\io\pytables.py", line 920, in append
    **kwargs)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\io\pytables.py", line 1265, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\io\pytables.py", line 3773, in write
    **kwargs)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\io\pytables.py", line 3460, in create_axes
    self.validate_min_itemsize(min_itemsize)
  File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\io\pytables.py", line 3101, in validate_min_itemsize
    "data_column" % k)
ValueError: min_itemsize has the key [LongStr] which is not an axis or data_column

This apparently means that you can't use min_itemsize without manually creating and appending to each separate table beforehand, which is the kind of thing append_to_multiple is supposed to shield you from.

I think append_to_multiple should special-case min_itemsize and split it into separate dicts for each sub-append. I don't know if there are other potential kwargs that need to be "allocated" separately to sub-appends, but if there are it might be good to split them too.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions