Skip to content

Commit f806463

Browse files
gh-134004: Added the reorganize() methods to dbm.sqlite, dbm.dumb and shelve (GH-134028)
They are similar to the same named method in dbm.gnu.
1 parent b595237 commit f806463

File tree

9 files changed

+172
-6
lines changed

9 files changed

+172
-6
lines changed

Doc/library/dbm.rst

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,16 @@
1515
* :mod:`dbm.ndbm`
1616

1717
If none of these modules are installed, the
18-
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
18+
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
1919
is a `third party interface <https://www.jcea.es/programacion/pybsddb.htm>`_ to
2020
the Oracle Berkeley DB.
2121

22+
.. note::
23+
None of the underlying modules will automatically shrink the disk space used by
24+
the database file. However, :mod:`dbm.sqlite3`, :mod:`dbm.gnu` and :mod:`dbm.dumb`
25+
provide a :meth:`!reorganize` method that can be used for this purpose.
26+
27+
2228
.. exception:: error
2329

2430
A tuple containing the exceptions that can be raised by each of the supported
@@ -186,6 +192,17 @@ or any other SQLite browser, including the SQLite CLI.
186192
The Unix file access mode of the file (default: octal ``0o666``),
187193
used only when the database has to be created.
188194

195+
.. method:: sqlite3.reorganize()
196+
197+
If you have carried out a lot of deletions and would like to shrink the space
198+
used on disk, this method will reorganize the database; otherwise, deleted file
199+
space will be kept and reused as new (key, value) pairs are added.
200+
201+
.. note::
202+
While reorganizing, as much as two times the size of the original database is required
203+
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.
204+
205+
.. versionadded:: next
189206

190207
:mod:`dbm.gnu` --- GNU database manager
191208
---------------------------------------
@@ -284,6 +301,10 @@ functionality like crash tolerance.
284301
reorganization; otherwise, deleted file space will be kept and reused as new
285302
(key, value) pairs are added.
286303

304+
.. note::
305+
While reorganizing, as much as one time the size of the original database is required
306+
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.
307+
287308
.. method:: gdbm.sync()
288309

289310
When the database has been opened in fast mode, this method forces any
@@ -438,6 +459,11 @@ The :mod:`!dbm.dumb` module defines the following:
438459
with a sufficiently large/complex entry due to stack depth limitations in
439460
Python's AST compiler.
440461

462+
.. warning::
463+
:mod:`dbm.dumb` does not support concurrent read/write access. (Multiple
464+
simultaneous read accesses are safe.) When a program has the database open
465+
for writing, no other program should have it open for reading or writing.
466+
441467
.. versionchanged:: 3.5
442468
:func:`~dbm.dumb.open` always creates a new database when *flag* is ``'n'``.
443469

@@ -460,3 +486,15 @@ The :mod:`!dbm.dumb` module defines the following:
460486
.. method:: dumbdbm.close()
461487

462488
Close the database.
489+
490+
.. method:: dumbdbm.reorganize()
491+
492+
If you have carried out a lot of deletions and would like to shrink the space
493+
used on disk, this method will reorganize the database; otherwise, deleted file
494+
space will not be reused.
495+
496+
.. note::
497+
While reorganizing, no additional free disk space is required. However, be aware
498+
that this factor changes for each :mod:`dbm` submodule.
499+
500+
.. versionadded:: next

Doc/library/shelve.rst

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,15 @@ Two additional methods are supported:
7575

7676
Write back all entries in the cache if the shelf was opened with *writeback*
7777
set to :const:`True`. Also empty the cache and synchronize the persistent
78-
dictionary on disk, if feasible. This is called automatically when the shelf
79-
is closed with :meth:`close`.
78+
dictionary on disk, if feasible. This is called automatically when
79+
:meth:`reorganize` is called or the shelf is closed with :meth:`close`.
80+
81+
.. method:: Shelf.reorganize()
82+
83+
Calls :meth:`sync` and attempts to shrink space used on disk by removing empty
84+
space resulting from deletions.
85+
86+
.. versionadded:: next
8087

8188
.. method:: Shelf.close()
8289

@@ -116,6 +123,11 @@ Restrictions
116123
* On macOS :mod:`dbm.ndbm` can silently corrupt the database file on updates,
117124
which can cause hard crashes when trying to read from the database.
118125

126+
* :meth:`Shelf.reorganize` may not be available for all database packages and
127+
may temporarely increase resource usage (especially disk space) when called.
128+
Additionally, it will never run automatically and instead needs to be called
129+
explicitly.
130+
119131

120132
.. class:: Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')
121133

Doc/whatsnew/3.15.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,13 +89,30 @@ New modules
8989
Improved modules
9090
================
9191

92+
dbm
93+
---
94+
95+
* Added new :meth:`!reorganize` methods to :mod:`dbm.dumb` and :mod:`dbm.sqlite3`
96+
which allow to recover unused free space previously occupied by deleted entries.
97+
(Contributed by Andrea Oliveri in :gh:`134004`.)
98+
99+
92100
difflib
93101
-------
94102

95103
* Improved the styling of HTML diff pages generated by the :class:`difflib.HtmlDiff`
96104
class, and migrated the output to the HTML5 standard.
97105
(Contributed by Jiahao Li in :gh:`134580`.)
98106

107+
108+
shelve
109+
------
110+
111+
* Added new :meth:`!reorganize` method to :mod:`shelve` used to recover unused free
112+
space previously occupied by deleted entries.
113+
(Contributed by Andrea Oliveri in :gh:`134004`.)
114+
115+
99116
ssl
100117
---
101118

Lib/dbm/dumb.py

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,14 @@
99
- seems to contain a bug when updating...
1010
1111
- reclaim free space (currently, space once occupied by deleted or expanded
12-
items is never reused)
12+
items is not reused exept if .reorganize() is called)
1313
1414
- support concurrent access (currently, if two processes take turns making
1515
updates, they can mess up the index)
1616
1717
- support efficient access to large databases (currently, the whole index
1818
is read when the database is opened, and some updates rewrite the whole index)
1919
20-
- support opening for read-only (flag = 'm')
21-
2220
"""
2321

2422
import ast as _ast
@@ -289,6 +287,34 @@ def __enter__(self):
289287
def __exit__(self, *args):
290288
self.close()
291289

290+
def reorganize(self):
291+
if self._readonly:
292+
raise error('The database is opened for reading only')
293+
self._verify_open()
294+
# Ensure all changes are committed before reorganizing.
295+
self._commit()
296+
# Open file in r+ to allow changing in-place.
297+
with _io.open(self._datfile, 'rb+') as f:
298+
reorganize_pos = 0
299+
300+
# Iterate over existing keys, sorted by starting byte.
301+
for key in sorted(self._index, key = lambda k: self._index[k][0]):
302+
pos, siz = self._index[key]
303+
f.seek(pos)
304+
val = f.read(siz)
305+
306+
f.seek(reorganize_pos)
307+
f.write(val)
308+
self._index[key] = (reorganize_pos, siz)
309+
310+
blocks_occupied = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
311+
reorganize_pos += blocks_occupied * _BLOCKSIZE
312+
313+
f.truncate(reorganize_pos)
314+
# Commit changes to index, which were not in-place.
315+
self._commit()
316+
317+
292318

293319
def open(file, flag='c', mode=0o666):
294320
"""Open the database file, filename, and return corresponding object.

Lib/dbm/sqlite3.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
1616
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
1717
ITER_KEYS = "SELECT key FROM Dict"
18+
REORGANIZE = "VACUUM"
1819

1920

2021
class error(OSError):
@@ -122,6 +123,9 @@ def __enter__(self):
122123
def __exit__(self, *args):
123124
self.close()
124125

126+
def reorganize(self):
127+
self._execute(REORGANIZE)
128+
125129

126130
def open(filename, /, flag="r", mode=0o666):
127131
"""Open a dbm.sqlite3 database and return the dbm object.

Lib/shelve.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,11 @@ def sync(self):
171171
if hasattr(self.dict, 'sync'):
172172
self.dict.sync()
173173

174+
def reorganize(self):
175+
self.sync()
176+
if hasattr(self.dict, 'reorganize'):
177+
self.dict.reorganize()
178+
174179

175180
class BsdDbShelf(Shelf):
176181
"""Shelf implementation using the "BSD" db interface.

Lib/test/test_dbm.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,67 @@ def test_anydbm_access(self):
135135
assert(f[key] == b"Python:")
136136
f.close()
137137

138+
def test_anydbm_readonly_reorganize(self):
139+
self.init_db()
140+
with dbm.open(_fname, 'r') as d:
141+
# Early stopping.
142+
if not hasattr(d, 'reorganize'):
143+
self.skipTest("method reorganize not available this dbm submodule")
144+
145+
self.assertRaises(dbm.error, lambda: d.reorganize())
146+
147+
def test_anydbm_reorganize_not_changed_content(self):
148+
self.init_db()
149+
with dbm.open(_fname, 'c') as d:
150+
# Early stopping.
151+
if not hasattr(d, 'reorganize'):
152+
self.skipTest("method reorganize not available this dbm submodule")
153+
154+
keys_before = sorted(d.keys())
155+
values_before = [d[k] for k in keys_before]
156+
d.reorganize()
157+
keys_after = sorted(d.keys())
158+
values_after = [d[k] for k in keys_before]
159+
self.assertEqual(keys_before, keys_after)
160+
self.assertEqual(values_before, values_after)
161+
162+
def test_anydbm_reorganize_decreased_size(self):
163+
164+
def _calculate_db_size(db_path):
165+
if os.path.isfile(db_path):
166+
return os.path.getsize(db_path)
167+
total_size = 0
168+
for root, _, filenames in os.walk(db_path):
169+
for filename in filenames:
170+
file_path = os.path.join(root, filename)
171+
total_size += os.path.getsize(file_path)
172+
return total_size
173+
174+
# This test requires relatively large databases to reliably show difference in size before and after reorganizing.
175+
with dbm.open(_fname, 'n') as f:
176+
# Early stopping.
177+
if not hasattr(f, 'reorganize'):
178+
self.skipTest("method reorganize not available this dbm submodule")
179+
180+
for k in self._dict:
181+
f[k.encode('ascii')] = self._dict[k] * 100000
182+
db_keys = list(f.keys())
183+
184+
# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
185+
size_before = _calculate_db_size(os.path.dirname(_fname))
186+
187+
# Delete some elements from the start of the database.
188+
keys_to_delete = db_keys[:len(db_keys) // 2]
189+
with dbm.open(_fname, 'c') as f:
190+
for k in keys_to_delete:
191+
del f[k]
192+
f.reorganize()
193+
194+
# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
195+
size_after = _calculate_db_size(os.path.dirname(_fname))
196+
197+
self.assertLess(size_after, size_before)
198+
138199
def test_open_with_bytes(self):
139200
dbm.open(os.fsencode(_fname), "c").close()
140201

Misc/ACKS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1365,6 +1365,7 @@ Milan Oberkirch
13651365
Pascal Oberndoerfer
13661366
Géry Ogam
13671367
Seonkyo Ok
1368+
Andrea Oliveri
13681369
Jeffrey Ollie
13691370
Adam Olsen
13701371
Bryan Olson
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
:mod:`shelve` as well as underlying :mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!reorganize` methods to
2+
recover unused free space previously occupied by deleted entries.

0 commit comments

Comments
 (0)