Skip to content

Building with LTO should skip "compilation" #43212

Closed
@glandium

Description

@glandium

See #43211 for some horrorifying timings from doing LTO builds in Firefox.

Please correct me where I'm wrong, but here is my understanding of the situation wrt building with LTO (and more or less confirmed by @alexcrichton and @mbrubeck on irc):

  • Cargo starts building all dependencies
  • For each dependency, the rust compiler creates an rlib
  • The rlib contains compiled code for the crate, as well as metadata about the crate.
  • When linking the main crate with LTO, the rust compiler uses the metadata from the dependee's rlibs, and compiles based on that and the code in the current crate. As I understand it, at this point, all the code that was compiled and put in those rlibs is not used.

In simplified and C/C++ terms, this is my understanding of what's happening:

  • Let's say we have a test.c that is built in a libtest library, and linked with foo.c into a foo binary.
  • The libtest library is generated with:
    • gcc -o test.o -c test.c -O3 that's the compiled code part
    • gcc -o test.lto.o -c test.c -O3 -flto that's the metadata used for LTO
    • gcc-ar cr libtest.a test.lto.o test.o
  • The code for the main binary is generated with:
    • gcc -o foo.lto.o -c foo.c -O3 -flto
    • (maybe rust even compiles the code here too? like gcc -o foo.o -c foo.c -O3)
    • gcc -flto -o foo foo.lto.o libtest.a

In the above, the fact is, if libtest.a only contained test.lto.o, the foo binary would still compile fine, because the compiled code is not used. Which means we've spent time generating that test.o for nothing.

Now, consider a crate like geckoservo, which, while it contains 3Kloc, you wouldn't expect to require the time it takes to build (it's well above a minute). @mbrubeck suggested that compiling the crate inlines a bunch of stuff. Which is probably what is happening. Except that seems completely irrelevant and wasted time, considering it will have to do it all again when linking the entire project.

FWIW, the -Ztime-passes output with last 1.20 nightly, for geckoservo looks like:

time: 0.011; rss: 32MB  parsing
time: 0.000; rss: 32MB  recursion limit
time: 0.000; rss: 32MB  crate injection
time: 0.000; rss: 32MB  plugin loading
time: 0.000; rss: 32MB  plugin registration
time: 0.243; rss: 134MB expansion
time: 0.000; rss: 134MB maybe building test harness
time: 0.000; rss: 134MB maybe creating a macro crate
time: 0.000; rss: 134MB checking for inline asm in case the target doesn't support it
time: 0.001; rss: 134MB early lint checks
time: 0.000; rss: 134MB AST validation
time: 0.015; rss: 137MB name resolution
time: 0.001; rss: 137MB complete gated feature checking
time: 0.005; rss: 140MB lowering ast -> hir
time: 0.001; rss: 138MB indexing hir
time: 0.000; rss: 138MB attribute checking
time: 0.000; rss: 135MB language item collection
time: 0.001; rss: 135MB lifetime resolution
time: 0.000; rss: 135MB looking for entry point
time: 0.000; rss: 135MB looking for plugin registrar
time: 0.000; rss: 135MB loop checking
time: 0.000; rss: 135MB static item recursion checking
time: 0.016; rss: 136MB compute_incremental_hashes_map
time: 0.000; rss: 136MB load_dep_graph
time: 0.000; rss: 136MB stability index
time: 0.002; rss: 136MB stability checking
time: 0.004; rss: 137MB type collecting
time: 0.000; rss: 137MB impl wf inference
time: 0.000; rss: 137MB coherence checking
time: 0.000; rss: 137MB variance testing
time: 0.009; rss: 138MB wf checking
time: 0.009; rss: 140MB item-types checking
time: 0.366; rss: 185MB item-bodies checking
time: 0.024; rss: 185MB const checking
time: 0.002; rss: 186MB privacy checking
time: 0.001; rss: 186MB intrinsic checking
time: 0.000; rss: 186MB effect checking
time: 0.005; rss: 186MB match checking
time: 0.001; rss: 186MB liveness checking
time: 0.076; rss: 193MB borrow checking
time: 0.000; rss: 193MB reachability checking
time: 0.001; rss: 193MB death checking
time: 0.000; rss: 193MB unused lib feature checking
time: 0.011; rss: 193MB lint checking
time: 0.000; rss: 193MB resolving dependency formats
  time: 0.009; rss: 194MB       write metadata
  time: 0.569; rss: 279MB       translation item collection
  time: 0.041; rss: 298MB       codegen unit partitioning
  time: 0.022; rss: 748MB       internalize symbols
time: 6.012; rss: 748MB translation
time: 0.000; rss: 748MB assert dep graph
time: 0.000; rss: 748MB serialize dep graph
  time: 4.810; rss: 712MB       llvm function passes [0]
  time: 79.068; rss: 958MB      llvm module passes [0]
  time: 21.767; rss: 929MB      codegen passes [0]
  time: 0.001; rss: 929MB       codegen passes [0]
time: 107.035; rss: 929MB       LLVM passes
time: 0.000; rss: 929MB serialize work products

e.g. most of the time is in llvm module and codegen passes.

Cc: @froydnj @rillian

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.I-compiletimeIssue: Problems and improvements with respect to compile times.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions