Skip to content

Commit caee121

Browse files
committed
Merge pull request #5369 from benalexau/master
ENH: HDFStore.flush() to optionally perform fsync (GH5364)
2 parents 4edd862 + 8b771a8 commit caee121

File tree

4 files changed

+23
-3
lines changed

4 files changed

+23
-3
lines changed

doc/source/io.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2745,6 +2745,9 @@ Notes & Caveats
27452745
need to serialize these operations in a single thread in a single
27462746
process. You will corrupt your data otherwise. See the issue
27472747
(:`2397`) for more information.
2748+
- If you use locks to manage write access between multiple processes, you
2749+
may want to use :py:func:`~os.fsync` before releasing write locks. For
2750+
convenience you can use ``store.flush(fsync=True)`` to do this for you.
27482751
- ``PyTables`` only supports fixed-width string columns in
27492752
``tables``. The sizes of a string based indexing column
27502753
(e.g. *columns* or *minor_axis*) are determined as the maximum size

doc/source/release.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,8 @@ API Changes
275275
- store `datetime.date` objects as ordinals rather then timetuples to avoid
276276
timezone issues (:issue:`2852`), thanks @tavistmorph and @numpand
277277
- ``numexpr`` 2.2.2 fixes incompatiblity in PyTables 2.4 (:issue:`4908`)
278+
- ``flush`` now accepts an ``fsync`` parameter, which defaults to ``False``
279+
(:issue:`5364`)
278280
- ``JSON``
279281

280282
- added ``date_unit`` parameter to specify resolution of timestamps.

pandas/io/pytables.py

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
import copy
1111
import itertools
1212
import warnings
13+
import os
1314

1415
import numpy as np
1516
from pandas import (Series, TimeSeries, DataFrame, Panel, Panel4D, Index,
@@ -525,12 +526,26 @@ def is_open(self):
525526
return False
526527
return bool(self._handle.isopen)
527528

528-
def flush(self):
529+
def flush(self, fsync=False):
529530
"""
530-
Force all buffered modifications to be written to disk
531+
Force all buffered modifications to be written to disk.
532+
533+
Parameters
534+
----------
535+
fsync : bool (default False)
536+
call ``os.fsync()`` on the file handle to force writing to disk.
537+
538+
Notes
539+
-----
540+
Without ``fsync=True``, flushing may not guarantee that the OS writes
541+
to disk. With fsync, the operation will block until the OS claims the
542+
file has been written; however, other caching layers may still
543+
interfere.
531544
"""
532545
if self._handle is not None:
533546
self._handle.flush()
547+
if fsync:
548+
os.fsync(self._handle.fileno())
534549

535550
def get(self, key):
536551
"""
@@ -4072,5 +4087,4 @@ def timeit(key, df, fn=None, remove=True, **kwargs):
40724087
store.close()
40734088

40744089
if remove:
4075-
import os
40764090
os.remove(fn)

pandas/io/tests/test_pytables.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,7 @@ def test_flush(self):
465465
with ensure_clean(self.path) as store:
466466
store['a'] = tm.makeTimeSeries()
467467
store.flush()
468+
store.flush(fsync=True)
468469

469470
def test_get(self):
470471

0 commit comments

Comments
 (0)