Description
Right now (cross-crate) "metadata" is encoded as an ad-hoc heterogeneous tree, described in rustc_metadata::schema
, with Lazy<T>
acting as indirection (as in "pointer to T
", inside the "metadata" blob) and letting the user choose whether to decode of the T
value.
There is also a random-access array (called "table" in #59953), which is currently only used for Entry
.
This cross-crate system predates the on-demand/incremental query system, and we have accumulated a lot of data in the schema
which is similar (but not always identical) to certain queries, and additional code to present that information through queries.
The disadvantages I see with the current approach are:
- a lot of
schema
/encoder
/decoder
boilerplate for everything- most of which isn't documented well, perpetuating the ad-hoc-ness
- somewhat inconsistent organization
- e.g.
predicates
inEntry
vssuper_predicates
inTraitData
- e.g.
- decoding more than is needed, even if mostly
Lazy
pointersEntry
's 15 fields are all decoded to read only 1, most of the time- arguably a significant performance issue (although we save some space)
- rustc_metadata: replace Entry table with one table for each of its fields (AoS -> SoA). #59953 is my attempt at solving this particular aspect
In #59953, the table of Entry
s is replaced by a table for everything that used to be in an Entry
field.
For example, the predicates_of
query would then perform predicates[i].decode()
instead of entries[i].decode().predicates.decode()
(irrelevant details elided).
This is effectively a trade-off:
- using more space because most of those tables aren't 100% filled
- taking less time because there are less unused details being decoded
- the query system dictates the granularity here, so matching it helps
- losing (some) cache locality might limit this win
However, we can go further - #59953 doesn't touch EntryKind
, which is still a sprawling enum
with even two levels of Lazy
indirection in places.
Ultimately, we could have "cross-crate metadata" be one table per query in most cases. This would accentuate the trade-off from #59953 further, but it would also allow simplifying rustc_metadata
and unifying it further with incremental save&restore.
One of the queries that would benefit most from this is def_kind
, which could be stored as a fully-populated table of bytes, much more compact and cheaper to decode than EntryKind
today.