Description
The intent is for types like MaybeUninit<u64>
to support dealing with partially initialized data: e.g., if we have a (u32, u16)
(and assuming for a second we could rely on its layout), it should be sound to transmute that to MaybeUninit<u64>
and back even though the padding between the two tuple fields might be uninitialized. Code like #94212 relies on this.
The thing is, we are compiling MaybeUninit<u64>
to i64
for LLVM -- MaybeUninit
is repr(transparent)
. This was required to avoid codegen regressions when MaybeUninit
started to be used in some hot data copying loops inside libcore. So, for this all to work out, we better be sure that i64
correctly preserves partially initialized data.
LLVM has two kinds of "uninit" data, undef
and poison
.
undef
is per-bit and precisely preserved in alliN
types, so we should be fine here.poison
, however, is per-value: when loading ani64
and any of its bytes ispoison
, the entire result ispoison
. That is exactly not what we want forMaybeUninit<u64>
. However, at least in current LLVM,poison
is only created in very few situations (such as "nowrap" arithmetic that overflows), and AFAIK none of them can happen in a UB-free Rust program -- so, basically "uninit" in Rust only ever corresponds toundef
in LLVM, never topoison
. (But I might have missed places where LLVM generatesposion
.)
So I think right now we are good. However, LLVM is slowly moving away from undef
and towards posion
, since undef
is seriously ill-behaved in many ways. And if that ever means that "uninit" in Rust could correspond to LLVM poison
, then we have a problem here -- we have to keep monitoring this situation, and it might be good for us to be involved in the relevant LLVM discussions here as well to make sure they are aware of this problem.
Similarly, as we evolve the MIR semantics we have to make sure that no UB-free program can generate poison
after compilation to LLVM.
A very elegant solution to this issue would be for LLVM to adopt the "byte type" proposal, however, so far my impression is the LLVM community is not convinced they need such a type. With a byte type, MaybeUninit<u64>
could be easily compiled to b64
in LLVM, and a byte type would preserve poison
precisely, so we'd be all good.
I am mostly opening this so we have some place to track the current situation, and to make sure everyone agrees on what the main concerns are here -- and to get input from folks with more LLVM experience in case I got some of this wrong.
Cc @rust-lang/wg-unsafe-code-guidelines @rust-lang/wg-llvm