Skip to content

API: make selecting coordinate / masking a bit easier in HDFStore #4467

Closed
@jreback

Description

@jreback

From the PyTables ML

Select where month=5 from the index
(this could be done internally maybe)

big issues is that Coordinates is sort of 'private' here,
make where take a boolean array / coordinates

# create a frame
In [45]: df = DataFrame(randn(1000,2),index=date_range('20000101',periods=1000))

In [53]: df
Out[53]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00
Freq: D
Data columns (total 2 columns):
0    1000  non-null values
1    1000  non-null values
dtypes: float64(2)

# store it as a table
In [46]: store = pd.HDFStore('test.h5',mode='w')

In [47]: store.append('df',df)

# select out the index (a datetimeindex in this case)
In [48]: c = store.select_column('df','index')

# get the coordinates of matching index
In [49]: coords = c[pd.DatetimeIndex(c).month==5]

# select those rows
In [51]: from pandas.io.pytables import Coordinates

In [50]: store.select('df',where=Coordinates(coords.index,None,None))
Out[50]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 93 entries, 2000-05-01 00:00:00 to 2002-05-31 00:00:00
Data columns (total 2 columns):
0    93  non-null values
1    93  non-null values
dtypes: float64(2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions