Skip to content

ENH: HDFStore.flush() to optionally perform fsync (GH5364) #5369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 29, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2745,6 +2745,9 @@ Notes & Caveats
need to serialize these operations in a single thread in a single
process. You will corrupt your data otherwise. See the issue
(:`2397`) for more information.
- If you use locks to manage write access between multiple processes, you
may want to use :py:func:`~os.fsync` before releasing write locks. For
convenience you can use ``store.flush(fsync=True)`` to do this for you.
- ``PyTables`` only supports fixed-width string columns in
``tables``. The sizes of a string based indexing column
(e.g. *columns* or *minor_axis*) are determined as the maximum size
Expand Down
2 changes: 2 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,8 @@ API Changes
- store `datetime.date` objects as ordinals rather then timetuples to avoid
timezone issues (:issue:`2852`), thanks @tavistmorph and @numpand
- ``numexpr`` 2.2.2 fixes incompatiblity in PyTables 2.4 (:issue:`4908`)
- ``flush`` now accepts an ``fsync`` parameter, which defaults to ``False``
(:issue:`5364`)
- ``JSON``

- added ``date_unit`` parameter to specify resolution of timestamps.
Expand Down
20 changes: 17 additions & 3 deletions pandas/io/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import copy
import itertools
import warnings
import os

import numpy as np
from pandas import (Series, TimeSeries, DataFrame, Panel, Panel4D, Index,
Expand Down Expand Up @@ -525,12 +526,26 @@ def is_open(self):
return False
return bool(self._handle.isopen)

def flush(self):
def flush(self, fsync=False):
"""
Force all buffered modifications to be written to disk
Force all buffered modifications to be written to disk.

Parameters
----------
fsync : bool (default False)
call ``os.fsync()`` on the file handle to force writing to disk.

Notes
-----
Without ``fsync=True``, flushing may not guarantee that the OS writes
to disk. With fsync, the operation will block until the OS claims the
file has been written; however, other caching layers may still
interfere.
"""
if self._handle is not None:
self._handle.flush()
if fsync:
os.fsync(self._handle.fileno())

def get(self, key):
"""
Expand Down Expand Up @@ -4072,5 +4087,4 @@ def timeit(key, df, fn=None, remove=True, **kwargs):
store.close()

if remove:
import os
os.remove(fn)
1 change: 1 addition & 0 deletions pandas/io/tests/test_pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,7 @@ def test_flush(self):
with ensure_clean(self.path) as store:
store['a'] = tm.makeTimeSeries()
store.flush()
store.flush(fsync=True)

def test_get(self):

Expand Down