-
Notifications
You must be signed in to change notification settings - Fork 13.6k
LangRef: storing poison in memory is UB #141339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-ir Author: Ralf Jung (RalfJung) ChangesBased on discussion with @nikic. Cc @nunoplopes. I'm not happy that storing poison would be UB -- there are some potential Rust features we cannot implement due to this -- but if that's how things currently are, it's better to document them. This also means that reg2mem is technically unsound since not all SSA values can actually be stored in memory; that pass should add Full diff: https://github.com/llvm/llvm-project/pull/141339.diff 1 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index ad0755e1531df..242471cf7f5a4 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -11433,6 +11433,12 @@ If ``<value>`` is of aggregate type, padding is filled with
:ref:`undef <undefvalues>`.
If ``<pointer>`` is not a well-defined value, the behavior is undefined.
+If ``<value>`` is poison, then behavior currently is effectively undefined due
+to the fact that many LLVM passes assume they can do load widening, and having
+poison in memory would "infect" the entire widened load. This can hopefully be
+alleviated in the future, but until then, frontends should refrain from
+generating code that stores poison values in memory.
+
Example:
""""""""
|
I'm totally against this. Widening in IR is a problem, yes, but this is not the way to go. Load widening can be solved by using vector loads or struct loads (if the new size is not a multiple of the old one). There are implications about making stores UB, such as making movement of stores and function calls even harder. |
I don't like it either -- I just think if that's the reality right now, we should at least write it down. Currently there is no way for frontends to know that they are not supposed to put I guess another way to frame this would be to file an issue saying that putting poison in memory can lead to surprising issues, and referencing this in the LangRef as a known issue without making it look like intended behavior? However, from what @nikic told me, this is apparently quite hard to fix. |
I don't think it's that hard to fix. Widening happens just in a few places. (like lowering on memory intrinsics such as memcpy/memcmp) and some codegen prepare. The issue is that stores of poison can happen anywhere. And as I mentioned, making it UB probably makes a bunch of optimizations wrong. We would be trading of a few wrong optimizations with another few wrong optimizations 😅 |
Indeed, reg2mem is wrong under this model, as I mentioned in the description already. Which other optimizations are you thinking of? |
Neither of those can be used for load widening in any practical sense. Struct loads are non-canonical and will very likely be removed entirely in the future. Vector operations come with a strong hint to actually lower them to vector instructions, which is undesirable for load widening use cases.
I don't think that this materially restricts the kind of store speculation we can perform currently. The speculated stores are spurious, i.e. they will store back a loaded loaded value, which can't be poison. (The only problematic interaction here is with metadata.)
The most important case is SROA, and I don't think it's easy to fix. The byte type is part of what is needed, but it's not sufficient in itself (for good optimization quality). Anyway, I see this as just documenting the status quo. It seems like we will be moving away from this eventually, so it probably doesn't make sense to try to make us strictly conforming to this model, but it's an important constraint to be aware of as things are right now. I think Ralf's phrasing here makes it clear that this is not intended as the permanent state of things. |
I used to hit a rare miscompilation with rustlantis + llubi, which contains a store with poison value in the optimized IR. It is a miscompilation under both the current semantics (propagating poison in loaded values) and the proposed semantics (storing poison is an imm UB). Unfortunately, I did not keep the reproducer and made a workaround in llubi by treating it as a no-op. Given the fact that a load in real hardware always preserves well-defined bytes in memory, I think it is possible to make load widening legal.
It still preserves poison in most cases, so that SROA does not need to insert freezes. I just have a glance at #52930 and it seems a feasible solution. I am fine with warning frontend developers about the potential unexpected behavior of storing poison values. But making it an immediate UB does not improve the situation. |
Interesting idea.
This model avoids freeze for plain store to load forwarding (as the store can only be fully poison), but isn't freeze in SROA still necessary for all the overlapping cases (which lower via integer widening)? |
That model has the problem that when there is a load followed by a store, we cannot remove the store as that would remove the implicit partial freeze.
|
Based on discussion with @nikic. Cc @nunoplopes.
I'm not happy that storing poison would be UB -- there are some potential Rust features we cannot implement due to this -- but if that's how things currently are, it's better to document them.
This also means that reg2mem is technically unsound since not all SSA values can actually be stored in memory; that pass should add
freeze
to avoid storing poison. Should I file an issue for that? (Unfortunately I won't be able to try and fix this myself.)