LangRef: storing poison in memory is UB #141339

RalfJung · 2025-05-24T09:13:24Z

Based on discussion with @nikic. Cc @nunoplopes.

I'm not happy that storing poison would be UB -- there are some potential Rust features we cannot implement due to this -- but if that's how things currently are, it's better to document them.

This also means that reg2mem is technically unsound since not all SSA values can actually be stored in memory; that pass should add freeze to avoid storing poison. Should I file an issue for that? (Unfortunately I won't be able to try and fix this myself.)

llvmbot · 2025-05-24T09:13:58Z

@llvm/pr-subscribers-llvm-ir

Author: Ralf Jung (RalfJung)

Changes

Based on discussion with @nikic. Cc @nunoplopes.

I'm not happy that storing poison would be UB -- there are some potential Rust features we cannot implement due to this -- but if that's how things currently are, it's better to document them.

This also means that reg2mem is technically unsound since not all SSA values can actually be stored in memory; that pass should add freeze to avoid storing poison. Should I file an issue for that? (Unfortunately I won't be able to try and fix this myself.)

Full diff: https://github.com/llvm/llvm-project/pull/141339.diff

1 Files Affected:

(modified) llvm/docs/LangRef.rst (+6)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ad0755e1531df..242471cf7f5a4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11433,6 +11433,12 @@ If ``<value>`` is of aggregate type, padding is filled with
 :ref:`undef <undefvalues>`.
 If ``<pointer>`` is not a well-defined value, the behavior is undefined.
 
+If ``<value>`` is poison, then behavior currently is effectively undefined due
+to the fact that many LLVM passes assume they can do load widening, and having
+poison in memory would "infect" the entire widened load. This can hopefully be
+alleviated in the future, but until then, frontends should refrain from
+generating code that stores poison values in memory.
+
 Example:
 """"""""

nunoplopes · 2025-05-24T11:37:28Z

I'm totally against this. Widening in IR is a problem, yes, but this is not the way to go. Load widening can be solved by using vector loads or struct loads (if the new size is not a multiple of the old one).

There are implications about making stores UB, such as making movement of stores and function calls even harder.

RalfJung · 2025-05-24T17:16:35Z

I don't like it either -- I just think if that's the reality right now, we should at least write it down. Currently there is no way for frontends to know that they are not supposed to put poison into memory; I was extremely surprised when I first heard about this.

I guess another way to frame this would be to file an issue saying that putting poison in memory can lead to surprising issues, and referencing this in the LangRef as a known issue without making it look like intended behavior? However, from what @nikic told me, this is apparently quite hard to fix.

nunoplopes · 2025-05-26T18:50:15Z

I don't think it's that hard to fix. Widening happens just in a few places. (like lowering on memory intrinsics such as memcpy/memcmp) and some codegen prepare.

The issue is that stores of poison can happen anywhere. And as I mentioned, making it UB probably makes a bunch of optimizations wrong. We would be trading of a few wrong optimizations with another few wrong optimizations 😅

RalfJung · 2025-05-26T19:05:43Z

Indeed, reg2mem is wrong under this model, as I mentioned in the description already. Which other optimizations are you thinking of?

nikic · 2025-05-26T20:05:21Z

Load widening can be solved by using vector loads or struct loads (if the new size is not a multiple of the old one).

Neither of those can be used for load widening in any practical sense. Struct loads are non-canonical and will very likely be removed entirely in the future. Vector operations come with a strong hint to actually lower them to vector instructions, which is undesirable for load widening use cases.

There are implications about making stores UB, such as making movement of stores and function calls even harder.

I don't think that this materially restricts the kind of store speculation we can perform currently. The speculated stores are spurious, i.e. they will store back a loaded loaded value, which can't be poison. (The only problematic interaction here is with metadata.)

I don't think it's that hard to fix. Widening happens just in a few places. (like lowering on memory intrinsics such as memcpy/memcmp) and some codegen prepare.

The most important case is SROA, and I don't think it's easy to fix. The byte type is part of what is needed, but it's not sufficient in itself (for good optimization quality).

Anyway, I see this as just documenting the status quo. It seems like we will be moving away from this eventually, so it probably doesn't make sense to try to make us strictly conforming to this model, but it's an important constraint to be aware of as things are right now.

I think Ralf's phrasing here makes it clear that this is not intended as the permanent state of things.

dtcxzyw · 2025-05-27T08:39:47Z

I used to hit a rare miscompilation with rustlantis + llubi, which contains a store with poison value in the optimized IR. It is a miscompilation under both the current semantics (propagating poison in loaded values) and the proposed semantics (storing poison is an imm UB). Unfortunately, I did not keep the reproducer and made a workaround in llubi by treating it as a no-op.

Given the fact that a load in real hardware always preserves well-defined bytes in memory, I think it is possible to make load widening legal.
Currently, a poison byte in memory may infect the whole loaded non-vector single value. Instead of loadV2(Ptr) = freeze(loadV1(Ptr)) discussed in
https://discourse.llvm.org/t/rfc-add-nopoison-attribute-metadata/79833/5, we can make the load instruction act as the following:

# Assume that Type is a non-vector single value type.
def loadV3(Ptr, Type):
    Size = Type.sizeInBytes()
    AllPoison = True
    Bytes = []
    for I in range(0, Size):
        Byte = loadByte(Ptr + I)
        if Byte != poison:
            AllPoison = False
            Bytes.append(Byte)
        else:
            # Freeze poison bytes instead of freezing the whole value.
            Bytes.append(freeze(Byte))
    if AllPoison:
        return poison
    return toVal(Bytes)

It still preserves poison in most cases, so that SROA does not need to insert freezes. I just have a glance at #52930 and it seems a feasible solution.

I am fine with warning frontend developers about the potential unexpected behavior of storing poison values. But making it an immediate UB does not improve the situation.

nikic · 2025-05-27T08:57:34Z

Interesting idea.

It still preserves poison in most cases, so that SROA does not need to insert freezes.

This model avoids freeze for plain store to load forwarding (as the store can only be fully poison), but isn't freeze in SROA still necessary for all the overlapping cases (which lower via integer widening)?

RalfJung · 2025-05-27T10:06:45Z

That model has the problem that when there is a load followed by a store, we cannot remove the store as that would remove the implicit partial freeze.

frederick-vs-ja · 2025-05-30T13:04:02Z

This sounds like documenting a bug. I'm not sure whether we've recorded some known bugs in the documentations.

LangRef: storing poison in memory is UB

7a5faab

llvmbot added the llvm:ir label May 24, 2025

nikic requested review from fhahn, nunoplopes, dtcxzyw and efriedma-quic May 24, 2025 09:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LangRef: storing poison in memory is UB #141339

LangRef: storing poison in memory is UB #141339

RalfJung commented May 24, 2025

Uh oh!

llvmbot commented May 24, 2025

Uh oh!

nunoplopes commented May 24, 2025

Uh oh!

RalfJung commented May 24, 2025

Uh oh!

nunoplopes commented May 26, 2025

Uh oh!

RalfJung commented May 26, 2025 •

edited

Loading

Uh oh!

nikic commented May 26, 2025

Uh oh!

dtcxzyw commented May 27, 2025

Uh oh!

nikic commented May 27, 2025

Uh oh!

RalfJung commented May 27, 2025 via email

Uh oh!

frederick-vs-ja commented May 30, 2025

Uh oh!

Uh oh!

LangRef: storing poison in memory is UB #141339

Are you sure you want to change the base?

LangRef: storing poison in memory is UB #141339

Conversation

RalfJung commented May 24, 2025

Uh oh!

llvmbot commented May 24, 2025

Uh oh!

nunoplopes commented May 24, 2025

Uh oh!

RalfJung commented May 24, 2025

Uh oh!

nunoplopes commented May 26, 2025

Uh oh!

RalfJung commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented May 26, 2025

Uh oh!

dtcxzyw commented May 27, 2025

Uh oh!

nikic commented May 27, 2025

Uh oh!

RalfJung commented May 27, 2025 via email

Uh oh!

frederick-vs-ja commented May 30, 2025

Uh oh!

Uh oh!

RalfJung commented May 26, 2025 •

edited

Loading