Skip to content

Commit ae4f07f

Browse files
vmggitster
authored andcommitted
pack-bitmap: implement optional name_hash cache
When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent bbcefa1 commit ae4f07f

File tree

8 files changed

+91
-7
lines changed

8 files changed

+91
-7
lines changed

Documentation/config.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1872,6 +1872,17 @@ pack.writebitmaps::
18721872
space and extra time spent on the initial repack. Defaults to
18731873
false.
18741874

1875+
pack.writeBitmapHashCache::
1876+
When true, git will include a "hash cache" section in the bitmap
1877+
index (if one is written). This cache can be used to feed git's
1878+
delta heuristics, potentially leading to better deltas between
1879+
bitmapped and non-bitmapped objects (e.g., when serving a fetch
1880+
between an older, bitmapped pack and objects that have been
1881+
pushed since the last gc). The downside is that it consumes 4
1882+
bytes per object of disk space, and that JGit's bitmap
1883+
implementation does not understand it, causing it to complain if
1884+
Git and JGit are used on the same repository. Defaults to false.
1885+
18751886
pager.<cmd>::
18761887
If the value is boolean, turns on or off pagination of the
18771888
output of a particular Git subcommand when writing to a tty.

Documentation/technical/bitmap-format.txt

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,12 @@ GIT bitmap v1 format
2121
requirement for the bitmap index format, also present in JGit,
2222
that greatly reduces the complexity of the implementation.
2323

24+
- BITMAP_OPT_HASH_CACHE (0x4)
25+
If present, the end of the bitmap file contains
26+
`N` 32-bit name-hash values, one per object in the
27+
pack. The format and meaning of the name-hash is
28+
described below.
29+
2430
4-byte entry count (network byte order)
2531

2632
The total count of entries (bitmapped commits) in this bitmap index.
@@ -129,3 +135,30 @@ The bitstream represented by the above chunk is then:
129135
The next word after `L_M` (if any) must again be a RLW, for the next
130136
chunk. For efficient appending to the bitstream, the EWAH stores a
131137
pointer to the last RLW in the stream.
138+
139+
140+
== Appendix B: Optional Bitmap Sections
141+
142+
These sections may or may not be present in the `.bitmap` file; their
143+
presence is indicated by the header flags section described above.
144+
145+
Name-hash cache
146+
---------------
147+
148+
If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
149+
a cache of 32-bit values, one per object in the pack. The value at
150+
position `i` is the hash of the pathname at which the `i`th object
151+
(counting in index order) in the pack can be found. This can be fed
152+
into the delta heuristics to compare objects with similar pathnames.
153+
154+
The hash algorithm used is:
155+
156+
hash = 0;
157+
while ((c = *name++))
158+
if (!isspace(c))
159+
hash = (hash >> 2) + (c << 24);
160+
161+
Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
162+
If implementations want to choose a different hashing scheme, they are
163+
free to do so, but MUST allocate a new header flag (because comparing
164+
hashes made under two different schemes would be pointless).

builtin/pack-objects.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ static off_t reuse_packfile_offset;
6464

6565
static int use_bitmap_index = 1;
6666
static int write_bitmap_index;
67+
static uint16_t write_bitmap_options;
6768

6869
static unsigned long delta_cache_size = 0;
6970
static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
@@ -851,7 +852,8 @@ static void write_pack_file(void)
851852
bitmap_writer_reuse_bitmaps(&to_pack);
852853
bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
853854
bitmap_writer_build(&to_pack);
854-
bitmap_writer_finish(written_list, nr_written, tmpname);
855+
bitmap_writer_finish(written_list, nr_written,
856+
tmpname, write_bitmap_options);
855857
write_bitmap_index = 0;
856858
}
857859

@@ -2201,6 +2203,12 @@ static int git_pack_config(const char *k, const char *v, void *cb)
22012203
write_bitmap_index = git_config_bool(k, v);
22022204
return 0;
22032205
}
2206+
if (!strcmp(k, "pack.writebitmaphashcache")) {
2207+
if (git_config_bool(k, v))
2208+
write_bitmap_options |= BITMAP_OPT_HASH_CACHE;
2209+
else
2210+
write_bitmap_options &= ~BITMAP_OPT_HASH_CACHE;
2211+
}
22042212
if (!strcmp(k, "pack.usebitmaps")) {
22052213
use_bitmap_index = git_config_bool(k, v);
22062214
return 0;

pack-bitmap-write.c

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -490,14 +490,28 @@ static void write_selected_commits_v1(struct sha1file *f,
490490
}
491491
}
492492

493+
static void write_hash_cache(struct sha1file *f,
494+
struct pack_idx_entry **index,
495+
uint32_t index_nr)
496+
{
497+
uint32_t i;
498+
499+
for (i = 0; i < index_nr; ++i) {
500+
struct object_entry *entry = (struct object_entry *)index[i];
501+
uint32_t hash_value = htonl(entry->hash);
502+
sha1write(f, &hash_value, sizeof(hash_value));
503+
}
504+
}
505+
493506
void bitmap_writer_set_checksum(unsigned char *sha1)
494507
{
495508
hashcpy(writer.pack_checksum, sha1);
496509
}
497510

498511
void bitmap_writer_finish(struct pack_idx_entry **index,
499512
uint32_t index_nr,
500-
const char *filename)
513+
const char *filename,
514+
uint16_t options)
501515
{
502516
static char tmp_file[PATH_MAX];
503517
static uint16_t default_version = 1;
@@ -514,7 +528,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
514528

515529
memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE));
516530
header.version = htons(default_version);
517-
header.options = htons(flags);
531+
header.options = htons(flags | options);
518532
header.entry_count = htonl(writer.selected_nr);
519533
memcpy(header.checksum, writer.pack_checksum, 20);
520534

@@ -525,6 +539,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
525539
dump_bitmap(f, writer.tags);
526540
write_selected_commits_v1(f, index, index_nr);
527541

542+
if (options & BITMAP_OPT_HASH_CACHE)
543+
write_hash_cache(f, index, index_nr);
544+
528545
sha1close(f, NULL, CSUM_FSYNC);
529546

530547
if (adjust_shared_perm(tmp_file))

pack-bitmap.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,9 @@ static struct bitmap_index {
6666
/* Number of bitmapped commits */
6767
uint32_t entry_count;
6868

69+
/* Name-hash cache (or NULL if not present). */
70+
uint32_t *hashes;
71+
6972
/*
7073
* Extended index.
7174
*
@@ -152,6 +155,11 @@ static int load_bitmap_header(struct bitmap_index *index)
152155
if ((flags & BITMAP_OPT_FULL_DAG) == 0)
153156
return error("Unsupported options for bitmap index file "
154157
"(Git requires BITMAP_OPT_FULL_DAG)");
158+
159+
if (flags & BITMAP_OPT_HASH_CACHE) {
160+
unsigned char *end = index->map + index->map_size - 20;
161+
index->hashes = ((uint32_t *)end) - index->pack->num_objects;
162+
}
155163
}
156164

157165
index->entry_count = ntohl(header->entry_count);
@@ -626,6 +634,9 @@ static void show_objects_for_type(
626634
entry = &bitmap_git.reverse_index->revindex[pos + offset];
627635
sha1 = nth_packed_object_sha1(bitmap_git.pack, entry->nr);
628636

637+
if (bitmap_git.hashes)
638+
hash = ntohl(bitmap_git.hashes[entry->nr]);
639+
629640
show_reach(sha1, object_type, 0, hash, bitmap_git.pack, entry->offset);
630641
}
631642

pack-bitmap.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ static const char BITMAP_IDX_SIGNATURE[] = {'B', 'I', 'T', 'M'};
2424
#define NEEDS_BITMAP (1u<<22)
2525

2626
enum pack_bitmap_opts {
27-
BITMAP_OPT_FULL_DAG = 1
27+
BITMAP_OPT_FULL_DAG = 1,
28+
BITMAP_OPT_HASH_CACHE = 4,
2829
};
2930

3031
enum pack_bitmap_flags {
@@ -57,6 +58,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
5758
void bitmap_writer_build(struct packing_data *to_pack);
5859
void bitmap_writer_finish(struct pack_idx_entry **index,
5960
uint32_t index_nr,
60-
const char *filename);
61+
const char *filename,
62+
uint16_t options);
6163

6264
#endif

t/perf/p5310-pack-bitmaps.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ test_perf_large_repo
99
# since we want to be able to compare bitmap-aware
1010
# git versus non-bitmap git
1111
test_expect_success 'setup bitmap config' '
12-
git config pack.writebitmaps true
12+
git config pack.writebitmaps true &&
13+
git config pack.writebitmaphashcache true
1314
'
1415

1516
test_perf 'repack to disk' '

t/t5310-pack-bitmaps.sh

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ test_expect_success 'setup repo with moderate-sized history' '
1414
git checkout master &&
1515
blob=$(echo tagged-blob | git hash-object -w --stdin) &&
1616
git tag tagged-blob $blob &&
17-
git config pack.writebitmaps true
17+
git config pack.writebitmaps true &&
18+
git config pack.writebitmaphashcache true
1819
'
1920

2021
test_expect_success 'full repack creates bitmaps' '

0 commit comments

Comments
 (0)