Skip to content

Notes about MIR generation for match expressions #943

Open
@mark-i-m

Description

@mark-i-m

I wrote these up while working on rust-lang/rust#72680

I'm not an expert on this code (this is the fist time I'm reading it), so take this with a grain of salt.

Background

First, my assumption in the last post is incorrect. or-patterns aren't desugared into different patterns. IIUC, this is to avoid exponential blowup both in the size of the MIR and of the generated code, as well as the computational time to check compile. Here is a quick overview of the code as it stands today (to the best of my understanding).

  • All of the relevant code is in rustc_mir_build::build::matches. The comments throughout this module are pretty helpful, but kind of high-level.

  • We have a match expression, which consists of a place to match on (the "scrutinee") and collection of Arms. Each arm has a pattern and possible a guard expression, and each pattern may be composed of arbitrarily nested subpatterns. Given this, we want to generate the MIR that checks all of the relevant patterns to find the first matching arm.

  • The main entry point is match_expr https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L87 The comments on this function are helpful. They talk about the overall algorithm.

  • The heart of the algorithm is the match_candidates method, which also has great comments. https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L890 This is where the MIR is actually built for the matching itself.

  • match_candidates takes a list of Candidates and a few references to BasicBlocks. The BasicBlocks are where we branch to in different cases, and we will be generating more blocks as we match on the scrutinee. We will recursively call match_candidates as we evaluate the candidates.

  • The argument (of match_candidates) candidates: &mut [&mut Candidate] is roughly equivalent to an arm (or part of an arm). Specifically: https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L687-L712 A particular Candidate is satisfied if all of its match_pairs are satisfied. Each MatchPair in match_pairs is a single pattern match against a single place. We would extend the match_pairs of the Candidate recursively to get nested matches. So for example if we had an arm like Foo { x: Some(0), y: false }, we would first have a MatchPair matching the scrutinee place with Foo; then, we would add a couple more to match the places scrutinee.x and scrutinee.y with Some and false, respectively; then, we would add one more to match place scrutinee.x.0 to 0. So by the end of this process, all of these MatchPairs need to match for the Candidate to match.

  • Notably, throughout the algorithm, there are recursive calls to match_candidates with subsets of candidates. This simplifies the logic significantly, but it means that candidates may be non-exhaustive (even though Rust matches are exhaustive).

  • match_candidates does a bunch of stuff, but most relevant for this issue is that it calls test_candidates_with_or. In the terminology of the code, a "test" is a check of whether a place matches part of a pattern (e.g. scrutinee.x matches Some). (Note: we are not actually executing this test right now; it's still compile time! Rather we are generating the code that does the test into the user's program. For some reason, this mindset was hard for me to stay in). test_candidates_with_or generates the MIR for the tests for each candidate in candidates by recursively calling match_candidates. Specifically, we pop the first candidate from candidates and generate it's MIR; then, we recursively match_candidates on the tail.

  • Or-patterns at the top-level of a pattern are handled specially in test_candidates_with_or. If the first pattern in candidates is an or-pattern, we call test_or_pattern https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L1202 This method splits the candidate into two or more Candidates temporarily and calls match_candidates on this small list of temporary candidates. Afterwards, if the candidates are all simple, we can generate some glue MIR that OR's them all together; otherwise, we just leave them as a collection of different candidates and we just have a larger MIR.

  • We continue recursively calling the above chain of methods until all or-patterns have been removed from the top level patterns. So we are left with only non-or-patterns at the top level.

  • When we have only non-or-patterns, test_candidates_with_or calls test_candidates https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/mod.rs#L1377 test_candidates has a lot of vary useful comments explaining what's going on. Basically, we only try to process one candidate, but in processing it, we may be able to share work for some of the other candidates. Processing a candidate requires figuring out what kind of test to apply for each MatchPair in a candidate. For example, if we are matching against a variant, we do a Switch test; if we are matching against an integer, we do a SwitchInt test, etc...

  • During test_candidates, we may add more MatchPairs to a candidate to test for nested subpatterns (more in the next few bullets). At the end of test_candidates, we recursively call match_candidates on the remaining candidates, which ensures that we process all subpatterns.

  • test_candidates calls test::test to generate an actual test for a MatchPair. https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/test.rs#L27 This function assumes that it will not have to generate a test for an or-pattern (they should all have been dealt with by test_or_pattern), and this is where the ICE is (more later).

  • test_candidates then tries to share the test with all candidates it can. For example, if more than one candidate need a test for scrutinee.x matches Some, then both candidates can branch from the basic block where that test is done, effectively combining two (or more) basic blocks in an otherwise exponential set of basic blocks. To determine which candidates can share the test, test_candidates calls test::sort_candidates https://github.com/rust-lang/rust/blob/0256d065d4901b63def6c04013da4f781d0752bb/compiler/rustc_mir_build/src/build/matches/test.rs#L485 (which also has good comments).

  • sort_candidates will find all candidates that can share the test and simplify them by adding MatchPairs to the relevant Candidates for the immediate child subpatterns of the top-level pattern (if there are subpatterns). These subpatterns are later evaluated whenever match_candidates is recursively called later (and they may expand to more subpatterns, etc) until we have processed all the MatchPairs in all the candidates and generated tests for them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-MIRArea: mid-level intermediate representation (MIR)A-matchArea: `match` expressionsC-enhancementCategory: enhancementE-hardDifficulty: might require advanced knowledgeT-compilerRelevant to compiler team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions