Skip to content

Should raw pointer deref/projections have to be in-bounds? #319

Closed
@RalfJung

Description

@RalfJung

Currently, when you write addr_of!((*ptr).field), ptr has to be a sufficiently aligned and dereferencable pointer (meaning it has to point to actually allocated memory), with the size/alignment determined by the pointee type.

Formally speaking, this stems from the fact that the * operator, which turns a value (of pointer or reference type) into a place, requires that the pointer be aligned and dereferencable. This is true regardless of the context in which the * operator is used, i.e., regardless of what happens to this place after it has been constructed.

Pragmatically speaking, this reflects the fact that addr_of!((*ptr).field) is lowered to a getelementptr inbounds in LLVM. That said, the rules in the reference go further than what is required for codegen -- they require ptr to be dereferencable for the full size of the type (not just the field), and that even if we do addr_of!(*ptr).

This situation frequently causes confusion, so maybe we want to change it. At least it'd be good to have a central place for discussion and reference, so I created this issue.
Cc @eddyb @joshtriplett @thomcc @Lokathor @cuviper rust-lang/reference#1000
Instance(s) of confusion in practice: rust-lang/rust#93459 (comment)

So, what could we do? Let me collect some proposals that have come up.

Remove restriction from * entirely

The most radical option is to entirely remove the clause which imposes a rule on *, and instead just require that when a place is actually read from (via move/copy) or stored to (via =), then it has to be sufficiently aligned and dereferencable as determined by the type of the place. (We currently do not need such a rule, since as an invariant all places that are constructed are aligned and dereferencable.)

This means we have to remove inbounds from getelementptr in place projection lowering when the "source" of this place projection is a raw pointer. We can still add inbounds for references or other places (locals/statics) since we know those to be dereferencable for other reasons (except for this twist). (We can not add it in cases like &(*ptr).field since ptr might be e.g. 4 bytes before the beginning of the allocation and .field offsets the ptr by 4.)

This has the potential of making LLVM alias analysis and other optimizations less effective. In particular, getelementptr might not just go out-of-bounds, it would even be allowed to overflow, so there are some tricks LLVM might not be able to pull any more.

It is certainly my favorite option, because it maximally reduces the amount of UB -- the remaining rule (about actual loads/stores) is pretty much unavoidable and typically expected by programmers.

Still prevent overflows

We could also say that *ptr on a raw pointer is UB if adding size_of_val_raw(ptr) to ptr would overflow, but allocation bounds do not matter. This would let us emit a hypothetical getelementptr nowrap, if LLVM were to add support for such a flag in the future. I doubt this is any less surprising than our current rules, TBH, but it is less likely to be violated by the typical kind of code people write. It is also more complicated, since we still have to explicitly add the rule which says that loading from and storing to a place requires it to be dereferencable and aligned.

Check inbounds on place projection

If we want to keep codegen unchanged, we could still relax our rules: we could allow addr_of!(*ptr), and we could make it so that addr_of!((*ptr).field) only needs to be inbounds for the actual offset that is performed.

For example, on top of the "remove restrictions" proposal, we could have an extra rule for each place projection which says that this projection needs to be in-bounds.

This has the advantage of keeping our current codegen and quite obviously aligning with the intuition many people seem to have for place projections. It also has some disadvantages though:

  • addr_of!((*ptr).field) continues to be UB if ptr is entirely dangling, so e.g. people wanting to write offset_of-style code still have to navigate around a footgun.
  • Our rules become more complicated. (Basically, we would replace one bullet in the "Behavior considered undefined" list by two bullets.)

Magic?

There's also the option (which I'm aware you aren't a fan of) of allowing addr_of!((*p).field) to violate the inbounds rule but still requiring &(*ptr).field to uphold it.

#319 (comment)

New syntax

What you really just want to be able to really ergonomically talk about field offsets.

#319 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions