Skip to content

HDFStore __unicode__ very slow for many keys #16503

Closed
@Kiv

Description

@Kiv

Code Sample, a copy-pastable example if possible

import pandas as pd
store = pd.HDFStore('test.h5', 'w')
for i in range(5000):
    store.put('table_{}'.format(i), pd.DataFrame([i]))

%time str(store)
CPU times: user 26.1 s, sys: 156 ms, total: 26.2 s
Wall time: 26.2 s

Problem description

The unicode method of HDFStore iterates over all the keys in the file to create the string representation. For larger files this operation becomes extremely slow and dumps an excessive amount of output to the console.

Worse, this completely bogs down PyCharm's debugger because it calls str(store) for every store that's in scope, on every step.

Expected Output

unicode should be a fast operation - just showing the file path would be sufficient. Detailed info on all the keys could be a separate method if needed at all.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-78-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: 0.19.0
statsmodels: 0.8.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.9
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
boto: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO DataIO issues that don't fit into a more specific labelIO HDF5read_hdf, HDFStorePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions