Closed
Description
For keccak crate I use simple loop unrolling macros:
macro_rules! unroll5 {
($var:ident, $body:block) => {
{ const $var: usize = 0; $body; }
{ const $var: usize = 1; $body; }
{ const $var: usize = 2; $body; }
{ const $var: usize = 3; $body; }
{ const $var: usize = 4; $body; }
};
}
Which combined generate a lot of unrolled code as a final result. On the latest Nightly it takes a lot more time to compile the crate than previously:
$ cargo clean; rustup run stable cargo build
Compiling keccak v0.1.0 (...)
Finished dev [unoptimized + debuginfo] target(s) in 3.9 secs
cargo clean; rustup run stable cargo check
Compiling keccak v0.1.0 (...)
Finished dev [unoptimized + debuginfo] target(s) in 2.30 secs
$ cargo clean; rustup run nightly cargo build
Compiling keccak v0.1.0 (...)
Finished dev [unoptimized + debuginfo] target(s) in 18.65 secs
$ cargo clean; rustup run nightly cargo check
Compiling keccak v0.1.0 (...)
Finished dev [unoptimized + debuginfo] target(s) in 2.32 secs
Here I've used only 10 iterations instead of 24 in the unroll24
macro, full version takes more than several minutes to compile. Judging by cargo check
warnings expansion takes approximately the same time and the drastic difference comes from later stages.
EDIT: RUSTFLAGS="-Z time-passes" rustup run nightly cargo build
produces the following result:
Click to expand
time: 0.001; rss: 48MB parsing
time: 0.000; rss: 50MB garbage collect incremental cache directory
time: 0.000; rss: 50MB recursion limit
time: 0.000; rss: 50MB crate injection
time: 0.000; rss: 50MB plugin loading
time: 0.000; rss: 50MB plugin registration
time: 0.000; rss: 50MB background load prev dep-graph
time: 0.059; rss: 71MB expansion
time: 0.000; rss: 71MB maybe building test harness
time: 0.001; rss: 71MB maybe creating a macro crate
time: 0.003; rss: 71MB creating allocators
time: 0.002; rss: 71MB AST validation
time: 0.013; rss: 74MB name resolution
time: 0.002; rss: 74MB complete gated feature checking
time: 0.000; rss: 74MB blocked while dep-graph loading finishes
time: 0.016; rss: 80MB lowering ast -> hir
time: 0.008; rss: 80MB early lint checks
time: 0.021; rss: 83MB indexing hir
time: 0.000; rss: 79MB load query result cache
time: 0.000; rss: 79MB looking for entry point
time: 0.000; rss: 79MB looking for plugin registrar
time: 0.001; rss: 79MB loop checking
time: 0.000; rss: 81MB attribute checking
time: 0.006; rss: 84MB stability checking
time: 0.012; rss: 88MB type collecting
time: 0.000; rss: 88MB outlives testing
time: 0.000; rss: 88MB impl wf inference
time: 0.000; rss: 88MB coherence checking
time: 0.000; rss: 88MB variance testing
time: 0.046; rss: 106MB wf checking
time: 0.023; rss: 109MB item-types checking
time: 1.518; rss: 121MB item-bodies checking
time: 0.031; rss: 122MB rvalue promotion
time: 0.014; rss: 122MB privacy checking
time: 0.002; rss: 122MB intrinsic checking
time: 0.006; rss: 122MB match checking
time: 0.084; rss: 118MB liveness checking
time: 0.324; rss: 134MB borrow checking
time: 0.002; rss: 135MB MIR borrow checking
time: 0.000; rss: 135MB MIR effect checking
time: 0.003; rss: 135MB death checking
time: 0.000; rss: 135MB unused lib feature checking
time: 0.039; rss: 138MB lint checking
time: 0.000; rss: 138MB dumping chalk-like clauses
time: 0.000; rss: 138MB resolving dependency formats
time: 0.037; rss: 140MB write metadata
time: 14.945; rss: 143MB translation item collection
time: 0.000; rss: 143MB codegen unit partitioning
time: 0.142; rss: 151MB translate to LLVM IR
time: 0.000; rss: 151MB assert dep graph
time: 0.000; rss: 152MB llvm function passes [2lyh15q6cjwzy18c]
time: 0.000; rss: 152MB llvm module passes [2lyh15q6cjwzy18c]
time: 0.002; rss: 159MB codegen passes [2lyh15q6cjwzy18c]
time: 0.024; rss: 160MB persist query result cache
time: 0.030; rss: 162MB llvm function passes [30rksvufw6ddw8se]
time: 0.004; rss: 163MB llvm module passes [30rksvufw6ddw8se]
time: 0.011; rss: 161MB persist dep-graph
time: 0.034; rss: 161MB serialize dep graph
time: 15.160; rss: 161MB translation
time: 0.526; rss: 117MB codegen passes [30rksvufw6ddw8se]
time: 0.645; rss: 108MB LLVM passes
time: 0.000; rss: 108MB serialize work products
time: 0.001; rss: 108MB linking
Metadata
Metadata
Assignees
Labels
Area: Constant evaluation, covers all const contexts (static, const fn, ...)Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..)Category: An issue proposing an enhancement or a PR with one.Issue: Problems and improvements with respect to compile times.Medium priorityRelevant to the compiler team, which will review and decide on the PR/issue.Performance or correctness regression from one stable version to another.