Description
Hi,
With some of the hdf5 files I have, pandas.HDFStore.groups()
returns an empty list. (as does .keys()
which iterates over the groups). However, the data are accessible via .get()
or .get_node()
.
This is related to #21543 and #21372 where the .groups()
logic was changed, in particular using self._handle.walk_groups()
instead of self._handle.walk_nodes()
, now to be found here:
Line 1212 in ea2e26a
Current Output
>>> hdf.groups()
[]
>>> hdf.keys()
[]
Expected Ouptut
List of groups and keys as visible with e.g. h5dump
.
Note: Changing the aforementioned line back to use .walk_nodes()
fixes the issue and lists the groups and keys properly:
>>> hdf.groups()
[/Data/Table Layout (Table(69462,), zlib(4)) ''
description := {
...
/Data/Array Layout/2D Parameters/Data Parameters (Table(15,)) ''
description := {
"mnemonic": StringCol(itemsize=8, shape=(), dflt=b'', pos=0),
"description": StringCol(itemsize=48, shape=(), dflt=b'', pos=1),
"isError": Int64Col(shape=(), dflt=0, pos=2),
"units": StringCol(itemsize=7, shape=(), dflt=b'', pos=3),
"category": StringCol(itemsize=31, shape=(), dflt=b'', pos=4)}
byteorder := 'little'
chunkshape := (642,)]]
>>> hdf.keys()
['/Data/Table Layout',
'/Metadata/Data Parameters',
'/Metadata/Experiment Notes',
'/Metadata/Experiment Parameters',
'/Metadata/Independent Spatial Parameters',
'/Metadata/_record_layout',
'/Data/Array Layout/Layout Description',
'/Data/Array Layout/1D Parameters/Data Parameters',
'/Data/Array Layout/2D Parameters/Data Parameters']
Fix
One solution would be (I guess) to revert #21543, another to fix at least .keys()
to use ._handle.walk_nodes()
instead of .groups()
in
Line 562 in ea2e26a
Could also be that it is a bug in pytables
.
Problem background
I was trying to figure out why some hdf5 files open fine with pandas
but fail with dask
.
The reason is that dask
allows wildcards and iterates over the keys to find valid ones. If .keys()
is empty, reading the files with dask
fails.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.27.2.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.1.post20191125
Cython : None
pytest : 5.0.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.2
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : None
tables : 3.6.1
xarray : 0.14.1
xlrd : None
xlwt : None
xlsxwriter : None