Skip to content

Another HDFStore error #2784

Closed
Closed
@jostheim

Description

@jostheim

After a long run of extracting features for some random forest action I ran into this when serializing the features:

Traceback (most recent call last):
File "XXXXX.py", line 1043, in
write_dataframe("features", all_df, store)
File "XXXXX.py", line 55, in write_dataframe
store[name] = df
File "/Library/Python/2.7/site-packages/pandas/io/pytables.py", line 218, in setitem
self.put(key, value)
File "/Library/Python/2.7/site-packages/pandas/io/pytables.py", line 458, in put
self._write_to_group(key, value, table=table, append=append, *_kwargs)
File "/Library/Python/2.7/site-packages/pandas/io/pytables.py", line 788, in _write_to_group
s.write(obj = value, append=append, complib=complib, *_kwargs)
File "/Library/Python/2.7/site-packages/pandas/io/pytables.py", line 1837, in write
self.write_array('block%d_values' % i, blk.values)
File "/Library/Python/2.7/site-packages/pandas/io/pytables.py", line 1639, in write_array
self.handle.createArray(self.group, key, value)
File "/Library/Python/2.7/site-packages/tables-2.4.0-py2.7-macosx-10.8-intel.egg/tables/file.py", line 780, in createArray
object=object, title=title, byteorder=byteorder)
File "/Library/Python/2.7/site-packages/tables-2.4.0-py2.7-macosx-10.8-intel.egg/tables/array.py", line 167, in init
byteorder, _log)
File "/Library/Python/2.7/site-packages/tables-2.4.0-py2.7-macosx-10.8-intel.egg/tables/leaf.py", line 263, in init
super(Leaf, self).init(parentNode, name, _log)
File "/Library/Python/2.7/site-packages/tables-2.4.0-py2.7-macosx-10.8-intel.egg/tables/node.py", line 250, in init
self._v_objectID = self._g_create()
File "/Library/Python/2.7/site-packages/tables-2.4.0-py2.7-macosx-10.8-intel.egg/tables/array.py", line 200, in _g_create
nparr, self._v_new_title, self.atom)
File "hdf5Extension.pyx", line 884, in tables.hdf5Extension.Array._createArray (tables/hdf5Extension.c:8498)
tables.exceptions.HDF5ExtError: Problems creating the Array.

The error is pretty undefined, I know the table I was writing was big, >17000 columns by > 20000 rows. There are lots of np.nan's in the columns.

Since I seem to be one of the few who are serializing massive sets, and I have a 64GB RAM machine sitting next to me, are there some test cases that I can run, or write that would help? Thinking setting up large mixed dataframes etc...

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions