Skip to content

BUG/ER: HDFStore write with empty frame reports an error (rather than suceeding) #4273

Closed
@jreback

Description

@jreback

writing to an HDFStore with an empty-frame with invalid dtypes raises, maybe should just proceed (or is it the dtypes call that is actually wrong here: see #4272)

came up in this question: http://stackoverflow.com/questions/17691912/problems-with-merging-on-disk-tables-with-millions-of-rows/17698740#17698740

In [26]: df = DataFrame(randn(10,2),columns=list('AB'))

In [28]: df['C'] = 'foo'

In [33]: df.to_hdf('test.h5','df',mode='w',table=True)

In [35]: pd.read_hdf('test.h5','df')
Out[35]: 
          A         B    C
0 -1.123712 -1.146515  foo
1  0.921705  1.800419  foo
2 -0.769236 -0.553307  foo
3 -0.747601 -1.783439  foo
4 -1.110340  1.601026  foo
5  0.743869 -2.135140  foo
6  1.033699  2.028479  foo
7 -0.755478 -1.060223  foo
8  0.079326 -2.671624  foo
9 -2.262756  0.406850  foo

In [36]: pd.read_hdf('test.h5','df').dtypes
Out[36]: 
A    float64
B    float64
C     object
dtype: object

In [37]: df[df.C=='bar']
Out[37]: 
Empty DataFrame
Columns: [A, B, C]
Index: []

In [38]: df[df.C=='bar'].dtypes
Out[38]: 
A   NaN
B   NaN
C   NaN
dtype: float64

In [39]: df[df.C=='bar'].to_hdf('test.h5','df',append=True)
TypeError: Cannot serialize the column [C] because
its data contents are [empty] object dtype

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugError ReportingIncorrect or improved errors from pandasIO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions