Description
While it's generally assumed that build systems on Windows are slower than build systems on Linux, I'm seeing a discrepancy of up to nearly 2x differences in compile times per crate on a Windows machine vs a Linux machine. These are personal machines I work on and they're not exactly equivalent machines, but I'm pretty surprised about the 2x differences I'm seeing here and wanted to open an issue to see if we can investigate to get to the bottom of what's going on.
The specifications of the machines I have are:
- Linux - Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz, 14-core/28-thread, 64GB ram
- Windows - Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz, 4-core/8-thread, 32GB ram
I don't really know a ton about Intel CPUs, so I'm not actually sure if these are expected where the i9 is 2x faster than the i7. I wanted to write down some details though to see if others have thoughts. All Cargo commands were executed with -j4
to ensure that neither machine had an unfair parallelism advantage, and also to ideally isolate the effect of hyperthreads.
I started out by building https://github.com/cranestation/wasmtime/tree/ab3cd945bc2f4626a2fae8eabf6c7108973ce1a5, and the full -Ztimings
graph I got was:
For the same project and the same compiler commit the Windows build is nearly 70% slower! I don't think that my CPUs have a 70% performance difference between them, and I don't have a perfect test environment for this, but 70% feels like a huge performance discrepancy between Linux and Windows.
Glancing at the slow building crates (use the "min unit time" slider to see them more easily) I'm seeing that almost all crates are 2x slower on Windows than on Linux. This doesn't look like a "chalk it up to windows being slow" issue, but this is where I started thinking that this was more likely to be a bug somewhere in rustc and/or LLVM.
Next up I wanted to try out -Z self-profile
on a particular crate. One I wrote recently was the wast
crate, which took 13.76s on Linux and 23.05s on Windows. I dug in a bit more building just that crate at https://github.com/alexcrichton/wat/tree/2288911124001d30de0a68e284db9ab010495536/crates/wast.
Here sure enough, the command cargo +nightly build --release -p wast -j4
has a huge discrepancy:
- Linux - 5.18s
- Windows - 8.58s
Next up I tried -Z self-profile
and using measurme
I ran summarize diff
and got this output, notably:
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| Item | Self Time | Item count | Cache hits | Blocked time | Incremental load time |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_thin_lto_optimize | +3.86042516s | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_optimize_module_passes | +3.152410865s | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_codegen_emit_obj | +1.783877999s | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| codegen_crate | +1.021669947s | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_thin_lto_import | +245.950489ms | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| codegen_module | +220.253166ms | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_optimize_function_passes | +134.256719ms | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
| LLVM_module_codegen_make_bitcode | +111.530996ms | +0 | +0 | +0ns | +0ns |
+---------------------------------------------+---------------+------------+------------+--------------+-----------------------+
For whatever reason, it appears that LLVM is massively slower on Windows than it is on Linux.
It was at this point that I decided to write up the issue here and get this all down in a report. I suspect that this is either a build system problem with Windows or it's a compiler problem. We're using Clang on Linux but we're not using Clang on Windows yet, so it may be time to make the transition!