Skip to content

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Open
@RalfJung

Description

@RalfJung

The intent is for types like MaybeUninit<u64> to support dealing with partially initialized data: e.g., if we have a (u32, u16) (and assuming for a second we could rely on its layout), it should be sound to transmute that to MaybeUninit<u64> and back even though the padding between the two tuple fields might be uninitialized. Code like #94212 relies on this.

The thing is, we are compiling MaybeUninit<u64> to i64 for LLVM -- MaybeUninit is repr(transparent). This was required to avoid codegen regressions when MaybeUninit started to be used in some hot data copying loops inside libcore. So, for this all to work out, we better be sure that i64 correctly preserves partially initialized data.

LLVM has two kinds of "uninit" data, undef and poison.

  • undef is per-bit and precisely preserved in all iN types, so we should be fine here.
  • poison, however, is per-value: when loading an i64 and any of its bytes is poison, the entire result is poison. That is exactly not what we want for MaybeUninit<u64>. However, at least in current LLVM, poison is only created in very few situations (such as "nowrap" arithmetic that overflows), and AFAIK none of them can happen in a UB-free Rust program -- so, basically "uninit" in Rust only ever corresponds to undef in LLVM, never to poison. (But I might have missed places where LLVM generates posion.)

So I think right now we are good. However, LLVM is slowly moving away from undef and towards posion, since undef is seriously ill-behaved in many ways. And if that ever means that "uninit" in Rust could correspond to LLVM poison, then we have a problem here -- we have to keep monitoring this situation, and it might be good for us to be involved in the relevant LLVM discussions here as well to make sure they are aware of this problem.

Similarly, as we evolve the MIR semantics we have to make sure that no UB-free program can generate poison after compilation to LLVM.

A very elegant solution to this issue would be for LLVM to adopt the "byte type" proposal, however, so far my impression is the LLVM community is not convinced they need such a type. With a byte type, MaybeUninit<u64> could be easily compiled to b64 in LLVM, and a byte type would preserve poison precisely, so we'd be all good.

I am mostly opening this so we have some place to track the current situation, and to make sure everyone agrees on what the main concerns are here -- and to get input from folks with more LLVM experience in case I got some of this wrong.
Cc @rust-lang/wg-unsafe-code-guidelines @rust-lang/wg-llvm

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions