Skip to content

Tracking issue for MIR-only RLIBs #38913

Closed
@michaelwoerister

Description

@michaelwoerister

There's been some talk about switching RLIBs to "MIR-only", that is, make RLIBs contain only the MIR representation of a program and not the LLVM IR and machine code as they do now. This issue will try to collect some advantages, disadvantages, and other concerns such an approach would entail:

Advantages

  • Less code duplication, which has four benefits:
    • RLIBs would be smaller because they would not contain LLVM IR and machine code anymore.
    • RLIBs and leaf crates would be smaller because, at the moment, instantiations of generic functions show up multiple times in the object code and LLVM IR.
    • RLIBs and leaf crates would be smaller because the compiler would be able instantiate monomorphic functions strictly on demand, as @japaric points out.
    • Possibly faster whole-project compiles, since generic instances are never compiled multiple times (although see "Disadvantages")
  • RLIBs would compile faster because the trans and LLVM passes would always be skipped (much like when compiling with -C metadata).
  • At the moment libstd is compiled with -Cdebuginfo=1, which is good in general but as a side-effect increases the size of Rust binaries, even if they are built without debuginfo (because the debuginfo from libstd gets statically linked into the binaries). This problem would not exist with MIR-only rlibs.
  • In the past we've had problems with WeakODR linkage and COMDAT sections on MinGW. WeakODR linkage is one way to deal with duplicate generic instances and avoiding those would also remove any reason to use WeakODR.
  • We would always get LTO-grade compiler optimizations since all code is available at codegen time.
  • Some targets, like NVPTX, don't seem to support regular linking (see NVPTX: non-inlined functions can't be used cross crate #38787). Only generating object code in leaf crates would solve this problem.
  • There seems be some indication that MIR-only RLIBs would help with making the Rust compiler more backend agnostic (see WASM-related issue Migrate wasm target to LLVM wasm backend #38804).
  • Generating LLVM IR only in leaf crates would make it easier to add comprehensive LLVM-based instrumentation like LeakSanitizer without recompiling libstd (see LeakSanitizer, ThreadSanitizer, AddressSanitizer and MemorySanitizer support #38699), as @japaric points out.
  • All Rust code (even that from libstd) can be compiled with -C target-cpu=native, potentially resulting in better code, as @japaric points out.
  • The build process of multi-crate project would gain more parallelism, since downstream crates don't need to wait for upstream crate's codegen, even though they could already compile up until the linking phase, as @est31 points out.

Disadvantages

  • The leaf crates (executables, staticlibs, dylibs, cdylibs) would take more time to compile because
    1. the machine code of monomorphic functions from upstream crates would not be "cached" anymore, and
    2. since LLVM sees more code at once, some super-linear optimizations would take dis-proportionally more time (like when one compiles with LTO now)
  • People might rely on pub #[no_mangle] items being exported from RLIBs and link against them directly. This would not be possible anymore, as @nagisa points out.

Non-Advantages

  • MIR-only libs would not be platform independent. One could think that that should be the case but because of cfg switches, MIR is not platform independent either.

Mitigation strategies for disadvantages:

  1. The problem of caching machine code would be solved in a generalized form by incremental compilation. One has to keep in mind though that incremental compilation will produce less performant code because it prevents many opportunities for inlining.
  2. We could provide an additional, more coarse-grained codegen unit partitioning scheme for incremental compilation (e.g. one CGU per crate) for better runtime performance at the cost of longer compile times.
  3. The amount of code LLVM sees at once can easily be controlled via -C codegen-units already, which provides a means of reducing super-linear optimizations.

Open Questions

  • I think we support "bundling" native libraries into RLIBs. We might still need to keep supporting this, even if we don't store machine code originating from Rust?

Please help collect more data on the viability of MIR-only RLIBs.

cc @rust-lang/core @rust-lang/compiler @rust-lang/tools @rkruppe

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationC-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions