Description
Current behavior 😯
As attested by the current journey test snapshots, gix free pack verify --statistics
outputs compression-related sizes in units of "KB":
But it is not immediately clear what unit that actually is. Is it…
- …an SI decimal kilobyte, equal to 1000 bytes? (kB)
- …an IEC binary kibibyte, equal to 1024 bytes? (KiB)
It turns out that, in this case, 1 KB = 1 kB, but it is not obvious.
Expected behavior 🤔
It is also not obvious what unit is intended. gitoxide-core
uses the bytesize
library to display the units:
gitoxide/gitoxide-core/src/pack/verify.rs
Lines 235 to 244 in 79dabb0
What unit is displayed and what symbol is used to represent it varies across major versions of bytesize
. In current (i.e. recent stable) releases, the default unit is the IEC binary kibibyte, which it abbreviates KiB; while one can explicitly request the SI decimal kilobyte, which it abbreviates kB. But old versions of bytesize
behave differently, defaulting to the SI decimal kilobyte, and also abbreviating it with the non-SI symbol KB. The new behavior came in as of bytesize
2.0.0. But gitoxide-core
depends on:
gitoxide/gitoxide-core/Cargo.toml
Line 61 in 79dabb0
I suggest upgrading to bytesize
2.0.* and deciding whether we actually want…
-
…units of 1000 bytes abbreviated kB, in which case the above would be changed to:
gitoxide/gitoxide-core/src/pack/verify.rs
Lines 235 to 244 in 46df372
-
…or units of 1024 bytes abbreviated KiB, in which case no change would be needed to that source code file.
(Though it could, if desired, be made explicit by calling
iec()
where the SI alternative callssi()
.)
Git behavior
I'm not sure if there's a Git behavior that should be considered to correspond exactly to this, since gix
doesn't have or aim for the same interface as git
, and since git verify-pack
does not show file sizes in its statistics:
$ git verify-pack -s tests/fixtures/packs/pack-11fdfa9e156ab73caae3b6da867192221f2089c2.idx
non delta: 18 objects
chain length = 1: 4 objects
chain length = 2: 3 objects
chain length = 3: 1 object
chain length = 4: 2 objects
chain length = 5: 1 object
chain length = 6: 1 object
But some other git
commands do show sizes of things in "human" units. For example:
$ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 163413
packs: 2
size-pack: 79.16 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
When it is run without -H
the size-pack
value is shown with no unit, but git-count-objects(1) documents it as being in units of KiB. With neither -v
nor -H
, one gets:
$ git count-objects
0 objects, 0 kilobytes
There, "kilobytes" is ambiguous. One might think it means decimal SI kilobytes (1000 bytes). But actually that occurrence of "kilobytes" means binary IEC kibibytes, as revealed by:
if (human_readable)
strbuf_humanise_bytes(&buf, loose_size);
else
strbuf_addf(&buf, "%lu kilobytes",
(unsigned long)(loose_size / 1024));
I don't think any of this has much bearing on what we should do, since it's about display behavior that makes no effort to be similar to Git. However, it may be that the preference in Git for using binary IEC units--rather than decimal SI units--reflects a preference for those units, or would lead users to expect that gitoxide use such units. (My personal preference is also for binary IEC units.)
Any change here, especially if it includes upgrading bytesize
, should be fairly convenient for me to include in a larger PR that I am already working on. (Although the above-linked code currently fixes this by keeping them SI decimal kilobytes and changing the unit to "kB", I did that because it is closer to the current behavior, not to express a preference for that approach.)
Steps to reproduce 🕹
Check the journey test snapshot file shown above and observe that the journey tests are passing. Alternatively, run:
cargo run --bin=gix -- --no-verbose free pack verify --statistics tests/fixtures/packs/pack-11fdfa9e156ab73caae3b6da867192221f2089c2.idx
This shows the following, which are "KB" units where by "KB" it means what would be less ambiguously called "kB":
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.20s
Running `target/debug/gix --no-verbose free pack verify --statistics tests/fixtures/packs/pack-11fdfa9e156ab73caae3b6da867192221f2089c2.idx`
objects per delta chain length
0: 18
1: 4
2: 3
3: 1
4: 2
5: 1
6: 1
->: 30
averages
delta chain length: 1;
decompressed entry [B]: 3456;
compressed entry [B]: 1725;
decompressed object size [B]: 9621;
compression
compressed entries size : 51.8 KB
decompressed entries size : 103.7 KB
total object size : 288.7 KB
pack size : 51.9 KB
num trees : 15
num blobs : 5
num commits : 10
num tags : 0
compression ratio : 2.00
delta compression ratio : 5.58
delta gain : 2.78
pack overhead : 0.235%
Broadening the scope: Other uses of ambiguous units
I've framed this in terms of gix free pack verify --statistics
because that's what I stumbled upon first (EliahKagan#18 (comment)), and because the exact way that formats sizes is under test, and because I didn't really think through the full scope this issue should have. But this should very possibly be construed more broadly: some other places also show ambiguous units, and also show what I believe to be decimal SI units when they might perhaps better show binary IEC units.
$ gix clone git@github.com:EliahKagan/gitoxide.git
19:17:24 indexing done 161.1K objects in 2.85s (56.5K objects/s)
19:17:24 decompressing done 175.7MB in 2.85s (61.7MB/s)
19:17:25 Resolving done 161.1K objects in 0.86s (188.1K objects/s)
19:17:25 Decoding done 1.4GB in 0.86s (1.6GB/s)
19:17:25 writing index file done 4.5MB in 0.02s (290.5MB/s)
19:17:25 create index file done 161.1K objects in 3.80s (42.4K objects/s)
19:17:25 read pack done 77.6MB in 4.10s (18.9MB/s)
19:17:25 checkout done 2.4K files in 0.14s (16.9K files/s)
19:17:25 writing done 72.3MB in 0.14s (516.5MB/s)
...
In contrast, Git uses binary IEC units:
git clone https://github.com/GitoxideLabs/gitoxide.git
Cloning into 'gitoxide'...
remote: Enumerating objects: 163078, done.
remote: Counting objects: 100% (869/869), done.
remote: Compressing objects: 100% (363/363), done.
remote: Total 163078 (delta 630), reused 506 (delta 506), pack-reused 162209 (from 6)
Receiving objects: 100% (163078/163078), 74.22 MiB | 37.44 MiB/s, done.
Resolving deltas: 100% (107181/107181), done.
Upgrading bytesize
in all gitoxide crates' Cargo.toml
does not affect that. I'm not sure if that's because prodash
depends on bytesize
1.3.3, or for some other reason.