Skip to content

LangRef: storing poison in memory is UB #141339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RalfJung
Copy link
Contributor

Based on discussion with @nikic. Cc @nunoplopes.

I'm not happy that storing poison would be UB -- there are some potential Rust features we cannot implement due to this -- but if that's how things currently are, it's better to document them.

This also means that reg2mem is technically unsound since not all SSA values can actually be stored in memory; that pass should add freeze to avoid storing poison. Should I file an issue for that? (Unfortunately I won't be able to try and fix this myself.)

@llvmbot
Copy link
Member

llvmbot commented May 24, 2025

@llvm/pr-subscribers-llvm-ir

Author: Ralf Jung (RalfJung)

Changes

Based on discussion with @nikic. Cc @nunoplopes.

I'm not happy that storing poison would be UB -- there are some potential Rust features we cannot implement due to this -- but if that's how things currently are, it's better to document them.

This also means that reg2mem is technically unsound since not all SSA values can actually be stored in memory; that pass should add freeze to avoid storing poison. Should I file an issue for that? (Unfortunately I won't be able to try and fix this myself.)


Full diff: https://github.com/llvm/llvm-project/pull/141339.diff

1 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+6)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ad0755e1531df..242471cf7f5a4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11433,6 +11433,12 @@ If ``<value>`` is of aggregate type, padding is filled with
 :ref:`undef <undefvalues>`.
 If ``<pointer>`` is not a well-defined value, the behavior is undefined.
 
+If ``<value>`` is poison, then behavior currently is effectively undefined due
+to the fact that many LLVM passes assume they can do load widening, and having
+poison in memory would "infect" the entire widened load. This can hopefully be
+alleviated in the future, but until then, frontends should refrain from
+generating code that stores poison values in memory.
+
 Example:
 """"""""
 

@nunoplopes
Copy link
Member

I'm totally against this. Widening in IR is a problem, yes, but this is not the way to go. Load widening can be solved by using vector loads or struct loads (if the new size is not a multiple of the old one).

There are implications about making stores UB, such as making movement of stores and function calls even harder.

@RalfJung
Copy link
Contributor Author

I don't like it either -- I just think if that's the reality right now, we should at least write it down. Currently there is no way for frontends to know that they are not supposed to put poison into memory; I was extremely surprised when I first heard about this.

I guess another way to frame this would be to file an issue saying that putting poison in memory can lead to surprising issues, and referencing this in the LangRef as a known issue without making it look like intended behavior? However, from what @nikic told me, this is apparently quite hard to fix.

@nunoplopes
Copy link
Member

I don't think it's that hard to fix. Widening happens just in a few places. (like lowering on memory intrinsics such as memcpy/memcmp) and some codegen prepare.

The issue is that stores of poison can happen anywhere. And as I mentioned, making it UB probably makes a bunch of optimizations wrong. We would be trading of a few wrong optimizations with another few wrong optimizations 😅

@RalfJung
Copy link
Contributor Author

RalfJung commented May 26, 2025

Indeed, reg2mem is wrong under this model, as I mentioned in the description already. Which other optimizations are you thinking of?

@nikic
Copy link
Contributor

nikic commented May 26, 2025

Load widening can be solved by using vector loads or struct loads (if the new size is not a multiple of the old one).

Neither of those can be used for load widening in any practical sense. Struct loads are non-canonical and will very likely be removed entirely in the future. Vector operations come with a strong hint to actually lower them to vector instructions, which is undesirable for load widening use cases.

There are implications about making stores UB, such as making movement of stores and function calls even harder.

I don't think that this materially restricts the kind of store speculation we can perform currently. The speculated stores are spurious, i.e. they will store back a loaded loaded value, which can't be poison. (The only problematic interaction here is with metadata.)

I don't think it's that hard to fix. Widening happens just in a few places. (like lowering on memory intrinsics such as memcpy/memcmp) and some codegen prepare.

The most important case is SROA, and I don't think it's easy to fix. The byte type is part of what is needed, but it's not sufficient in itself (for good optimization quality).


Anyway, I see this as just documenting the status quo. It seems like we will be moving away from this eventually, so it probably doesn't make sense to try to make us strictly conforming to this model, but it's an important constraint to be aware of as things are right now.

I think Ralf's phrasing here makes it clear that this is not intended as the permanent state of things.

@dtcxzyw
Copy link
Member

dtcxzyw commented May 27, 2025

I used to hit a rare miscompilation with rustlantis + llubi, which contains a store with poison value in the optimized IR. It is a miscompilation under both the current semantics (propagating poison in loaded values) and the proposed semantics (storing poison is an imm UB). Unfortunately, I did not keep the reproducer and made a workaround in llubi by treating it as a no-op.

Given the fact that a load in real hardware always preserves well-defined bytes in memory, I think it is possible to make load widening legal.
Currently, a poison byte in memory may infect the whole loaded non-vector single value. Instead of loadV2(Ptr) = freeze(loadV1(Ptr)) discussed in
https://discourse.llvm.org/t/rfc-add-nopoison-attribute-metadata/79833/5, we can make the load instruction act as the following:

# Assume that Type is a non-vector single value type.
def loadV3(Ptr, Type):
    Size = Type.sizeInBytes()
    AllPoison = True
    Bytes = []
    for I in range(0, Size):
        Byte = loadByte(Ptr + I)
        if Byte != poison:
            AllPoison = False
            Bytes.append(Byte)
        else:
            # Freeze poison bytes instead of freezing the whole value.
            Bytes.append(freeze(Byte))
    if AllPoison:
        return poison
    return toVal(Bytes)

It still preserves poison in most cases, so that SROA does not need to insert freezes. I just have a glance at #52930 and it seems a feasible solution.

I am fine with warning frontend developers about the potential unexpected behavior of storing poison values. But making it an immediate UB does not improve the situation.

@nikic
Copy link
Contributor

nikic commented May 27, 2025

Interesting idea.

It still preserves poison in most cases, so that SROA does not need to insert freezes.

This model avoids freeze for plain store to load forwarding (as the store can only be fully poison), but isn't freeze in SROA still necessary for all the overlapping cases (which lower via integer widening)?

@RalfJung
Copy link
Contributor Author

RalfJung commented May 27, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants