Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics

The intent is for types like `MaybeUninit<u64>` to support dealing with partially initialized data: e.g., if we have a `(u32, u16)` (and assuming for a second we could rely on its layout), it should be sound to transmute that to `MaybeUninit<u64>` and back *even though* the padding between the two tuple fields might be uninitialized. Code like https://github.com/rust-lang/rust/pull/94212 relies on this.

The thing is, we are compiling `MaybeUninit<u64>` to `i64` for LLVM -- `MaybeUninit` is `repr(transparent)`. This was required to avoid codegen regressions when `MaybeUninit` started to be used in some hot data copying loops inside libcore. So, for this all to work out, we better be sure that `i64` correctly preserves partially initialized data.

LLVM has two kinds of "uninit" data, `undef` and `poison`.

- `undef` is per-bit and precisely preserved in all `iN` types, so we should be fine here.
- `poison`, however, is per-value: when loading an `i64` and any of its bytes is `poison`, the entire result is `poison`. That is exactly *not* what we want for `MaybeUninit<u64>`. However, at least in current LLVM, `poison` is only created in very few situations (such as "nowrap" arithmetic that overflows), and AFAIK none of them can happen in a UB-free Rust program -- so, basically "uninit" in Rust only ever corresponds to `undef` in LLVM, never to `poison`. (But I might have missed places where LLVM generates `posion`.)

So I think *right now* we are good. However, LLVM is slowly moving away from `undef` and towards `posion`, since `undef` is seriously ill-behaved in many ways. And if that ever means that "uninit" in Rust could correspond to LLVM `poison`, then we have a problem here -- we have to keep monitoring this situation, and it might be good for us to be involved in the relevant LLVM discussions here as well to make sure they are aware of this problem.

Similarly, as we evolve the MIR semantics we have to make sure that no UB-free program can generate `poison` after compilation to LLVM.

A very elegant solution to this issue would be for LLVM to adopt the ["byte type" proposal](https://lists.llvm.org/pipermail/llvm-dev/2021-June/150883.html), however, so far my impression is the LLVM community is not convinced they need such a type. With a byte type, `MaybeUninit<u64>` could be easily compiled to `b64` in LLVM, and a byte type *would* preserve `poison` precisely, so we'd be all good.

I am mostly opening this so we have some place to track the current situation, and to make sure everyone agrees on what the main concerns are here -- and to get input from folks with more LLVM experience in case I got some of this wrong.
Cc @rust-lang/wg-unsafe-code-guidelines @rust-lang/wg-llvm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions