Further possibilities for mir::Constant cleanup

Our `mir::Constant`/`ty::Const` story has come a long way. I like that we have two distinct types that each have an `eval` function that returns an appropriate "evaluated" form (`mir::interpret::ConstValue`/`ty::Valtree`). Generally I think this is fairly clean now. :)

However, there are still some subtle invariants and sharp corners:
- Round-trips from an interpreter result (something that originally came out of `eval_to_allocation_raw`) to a `ConstValue` and back to an interpreter `OpTy` should ideally not alter the value (failing to do that leads to surprises like https://github.com/rust-lang/rust/issues/105536).
- We have to carefully think about *two* kinds of efficient representations (in `ConstValue` and in `Valtree`) and how to convert between them.
- I started looking at pattern matching and  it seems like `const_to_pat` is only ever called with a `Valtree` or `ConstValue`, and furthermore it relies on the caller "normalizing" to a valtree if possible. The logic of "try valtree first, fall back to general ConstValue" exists multiple times ([here](https://github.com/rust-lang/rust/blob/dac91a82e146d89fefe94dcbb53cd6c2662e354f/compiler/rustc_mir_build/src/thir/pattern/mod.rs#L628-L653) and [here](https://github.com/rust-lang/rust/blob/dac91a82e146d89fefe94dcbb53cd6c2662e354f/compiler/rustc_mir_build/src/thir/pattern/mod.rs#L523-L535)). `str` patterns must be turned into valtrees which is a very inefficient representation, despite them having a nice efficient representation as `ConstValue`.

So... I am wondering if we can have a world where there is only one efficient representation, and it's `Valtree`. I want to kill `ConstValue` entirely. The things that need to be represented efficiently are:
- Scalars: for primitive types, valtrees are already efficient. `ConstValue` is additionally efficient for newtype wrappers around primitive types. I don't know if these matter, but if they do we can always declare that for *any* type with Scalar ABI, the valtree is a leaf -- we can optimize away all the branches. Or we declare that for any `repr(transparent)` type, valtrees skip the branch and just represent the one non-1-ZST field (and use an empty branch for 1-ZST types). We have options, and we can hide this behind general functions that are given a type and know how to construct/destruct a valtree of that type.
- ZST: I don't know how relevant that is. But it's easy to say that the valtree for a ZST is always a branch with no children (i.e., optimize away all the nested eventually-empty branches in types like `[(), 1024]`).
- `&str` and `&[u8]`/`&[u8; N]`: currently not very efficient as valtrees. For patterns I think we already turn all strings into valtrees, but we don't currently do that for *all* literals, and that would be silly to do. So we'd definitely want a more efficient representation. And we have a lot of constraints here -- I think we want this to be both "not any more expensive to construct than the current `ConstValue`" (since otherwise we will regress MIR building), and we want it to be reasonably efficient to turn into an interpreter `OpTy` (so that uses of the constants in const-eval are efficient). Codegen also needs to be able to efficiently consume them.

### Efficient representation of str and byte literals

So for that last point about slices, currently "efficient consumption in const-eval and codegen" is achieved by sharing the codepath with handling arbitrary results of const-evaluation (both are represented as a `ConstAllocation`). const-eval needs to generate a fresh `AllocId` but at least it doesn't have to copy the data. MIR building right now isn't actually that efficient since a new `ConstAllocation` needs to be created.
 
It's hard to be efficient on all sides. I see two options here:

 - Prioritize MIR building: have a valtree variant that can hold a `[u8]`, suitably interned. (Maybe even push this all the way to `LitKind::ByteStr`/`CStr`? And maybe even `LitKind::Str`, which currently stores a `Symbol` but that's actually not that great for us since we can't get an `&'tcx [u8]` out of that.) Codegen backends should learn how to directly turn that into a suitable constant in their target representation. The const-eval interpreter needs to either make a full copy (boooh) or we alter `interpret::Allocation` to store the bytes in something like a `Cow<'tcx, [u8]>` -- that is still a bit less efficient than today's literal-to-`OpTy` path but at least it doesn't copy the string contents. (It needs to create a new `Allocation` though which we can currently avoid.)
 - Prioritize consumption: `ConstAllocation` is a type that the interpreter can handle very efficiently, and also codegen backends know very well how to turn this into their appropriate backend types. So we *could* use `ConstAllocation` as "an efficiently stored `str`/`[u8]`/`[u8; N]`". It's kind of dicey to put a `ConstAllocation` into a valtree though, since the entire point of valtrees is to *not* be able to represent arbitrary interpreter data. We'd have tight constraints: only for valtrees of type  `str`/`[u8]`/`[u8; N]`; no provenance; everything initialized (in `InitMaskBlocks::Lazy` mode); alignment 1; not mutable. I think this is coherent (equal data implies equal representation), and it's a lot less churn for the rest of the compiler than the previous alternative, but I can see how it could make someone feel uneasy.

### Valtrees as the only efficient representation

On the other hand, I think we could get a nice payoff from making valtrees the only efficient representation: `ConstValue` disappears! `mir::ConstantKind::Value` becomes just a `ConstAlloc` (we might have to add an `offset` if we still want to support projection for `try_destructure_mir_constant_for_diagnostics`). `EvalToConstValueResult` would have an `Either<ValTree, AllocId>` value, and it would pick `Left` exactly and only for scalar types, to make these easily accessible in the rest of the compiler without accessing `Allocation`s. (It might as well be `Either<Scalar, AllocId>`.) Turning that into a `mir::ConstantKind` will use `mir::ConstantKind::Ty` when encountering a `Left` -- basically, everything that today is `mir::ConstantKind::Value(_)` *except* `ConstValue::Indirect` becomes `mir::ConstantKind::Ty(ty::ConstKind::Value(_))`. We remap the "efficiently represented" `mir::ConstantKind::Value` to be valtrees instead.

We'd avoid the ambiguity of representing something as a `mir::ConstantKind::Value(ConstValue::Scalar(s))` vs `mir::ConstantKind::Ty(ty::ConstKind::Value(ValTree::Leaf(s)))`. We'd reduce the number of possible representations of a compile-time known value down to two: "efficient and abstract" vs "low-level". Generally values of any type can appear in either representation (if they can be valtree'd), except pretty much everyone will assume that scalars are represented efficiently. Pattern matching could much more directly use `Either<ValTree, AllocId>`, and its use of valtrees for `str` wouldn't be a performance hazard any more.

Maybe we'll want to give `Either<ValTree, AllocId>` a name, and maybe that name is `mir::ConstantValue` -- but that would still be quite different from the current `ConstValue` type. If the type exists it can have methods like "normalize to valtree if possible" (that's what pattern matching would want), "normalize to valtree if cheap" (i.e., only for scalars; that's what most consumers would want), "normalize to AllocId unless valtree is trivial" (that's what the interpreter / codegen would want -- a "trivial" valtree is a leaf, empty branch, or the efficient str/byte representation; crucially for "valtree if cheap"-normalized constants this is a NOP).

@rust-lang/wg-const-eval please let me know if this makes any sense. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Further possibilities for mir::Constant cleanup #115877

Efficient representation of str and byte literals

Valtrees as the only efficient representation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Further possibilities for mir::Constant cleanup #115877

Description

Efficient representation of str and byte literals

Valtrees as the only efficient representation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions