Skip to content

Read from HDF with empty where throws an error #26746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 11, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -649,6 +649,7 @@ I/O
- Bug in :func:`read_json` where date strings with ``Z`` were not converted to a UTC timezone (:issue:`26168`)
- Added ``cache_dates=True`` parameter to :meth:`read_csv`, which allows to cache unique dates when they are parsed (:issue:`25990`)
- :meth:`DataFrame.to_excel` now raises a ``ValueError`` when the caller's dimensions exceed the limitations of Excel (:issue:`26051`)
- Bug while selecting from HDF store with `where`='' specified (:issue:`26610`). Now the whole DataFrame will be returned from store.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :class`HDFStore` , put where='' in double backtikcs. Don't put anything after the issue number (e.g. remove the Now the whole DataFrame ...), you can put that before if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check please - not sure if I get :class correctly.


Plotting
^^^^^^^^
Expand Down
4 changes: 2 additions & 2 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,10 @@ def _ensure_term(where, scope_level):
wlist.append(w)
else:
wlist.append(Term(w, scope_level=level))
where = wlist
where = wlist if wlist else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is connected with next request. Check it please.

elif maybe_expression(where):
where = Term(where, scope_level=level)
return where
return where if where != "" else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do
where if where else None as bool('') and bool([]) evaluate to False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did id at first, but in case some input where is some valid expression as string - get after elif where as Index. So this comparsion throws an The truth value of a Int64Index is ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then just do

return where if len(where) else None



class PossibleDataLossError(Exception):
Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/io/test_pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -4731,6 +4731,34 @@ def test_read_py2_hdf_file_in_py3(self, datapath):
result = store['p']
assert_frame_equal(result, expected)

@pytest.mark.parametrize("call", [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not necessary to parameterize over the method calls, read_hdf is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified.

"read_hdf", "select", "select_as_coordinates", "select_as_multiple"
])
@pytest.mark.parametrize("where", ["", (), (None, ), [], [None]])
def test_select_empty_where(self, call, where):
# GH26610

# Using keyword `where` as '' or (), or [None], etc
# while reading from HDF store raises
# "SyntaxError: only a single expression is allowed"

df = pd.DataFrame([[1, 2], [1, 2]], columns=list("ab"))
with ensure_clean_path("empty_where.h5") as path:
with pd.HDFStore(path) as store:
key1 = "df1"
key2 = "df2"
store.put("df1", df[["a"]], "t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result = read_hdf(....)

assert_frame_equal(result, df)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

store.put("df2", df[["b"]], "t")
if call == "read_hdf":
pd.read_hdf(store, key1, where=where)
elif call == "select":
store.select(key1, where=where)
elif call == "select_as_coordinates":
store.select_as_coordinates(key1, where=where)
elif call == "select_as_multiple":
store.select_as_multiple(
[key1, key2], where=where, selector=key1)


class TestHDFComplexValues(Base):
# GH10447
Expand Down