@@ -17,108 +17,122 @@ So first, let's look at what the compiler does to your code. For now, we will
17
17
avoid mentioning how the compiler implements these steps except as needed;
18
18
we'll talk about that later.
19
19
20
- ### Invokation
21
-
22
- - The compile process begins when a user writes a Rust source program in text
23
- and invokes the ` rustc ` compiler on it. The work that the compiler needs to
24
- perform is defined by command-line options. For example, it is possible to
25
- enable nightly features (` -Z ` flags), perform ` check ` -only builds, or emit
26
- LLVM-IR rather than executable machine code. The ` rustc ` executable call may
27
- be indirect through the use of ` cargo ` .
28
- - Command line argument parsing occurs in the [ ` rustc_driver ` ] . This crate
29
- defines the compile configuration that is requested by the user and passes it
30
- to the rest of the compilation process as a [ ` rustc_interface::Config ` ] .
20
+ ### Invocation
21
+
22
+ Compilation begins when a user writes a Rust source program in text
23
+ and invokes the ` rustc ` compiler on it. The work that the compiler needs to
24
+ perform is defined by command-line options. For example, it is possible to
25
+ enable nightly features (` -Z ` flags), perform ` check ` -only builds, or emit
26
+ LLVM-IR rather than executable machine code. The ` rustc ` executable call may
27
+ be indirect through the use of ` cargo ` .
28
+
29
+ Command line argument parsing occurs in the [ ` rustc_driver ` ] . This crate
30
+ defines the compile configuration that is requested by the user and passes it
31
+ to the rest of the compilation process as a [ ` rustc_interface::Config ` ] .
31
32
32
33
### Lexing and parsing
33
34
34
- - The raw Rust source text is analyzed by a low-level lexer located in
35
- [ ` rustc_lexer ` ] . At this stage, the source text is turned into a stream of
36
- atomic source code units known as _ tokens_ . The lexer supports the
37
- Unicode character encoding.
38
- - The token stream passes through a higher-level lexer located in
39
- [ ` rustc_parse ` ] to prepare for the next stage of the compile process. The
40
- [ ` StringReader ` ] struct is used at this stage to perform a set of validations
41
- and turn strings into interned symbols (_ interning_ is discussed later).
42
- [ String interning] is a way of storing only one immutable
43
- copy of each distinct string value.
44
-
45
- - The lexer has a small interface and doesn't depend directly on the
46
- diagnostic infrastructure in ` rustc ` . Instead it provides diagnostics as plain
47
- data which are emitted in ` rustc_parse::lexer::mod ` as real diagnostics.
48
- - The lexer preserves full fidelity information for both IDEs and proc macros.
49
- - The parser [ translates the token stream from the lexer into an Abstract Syntax
50
- Tree (AST)] [ parser ] . It uses a recursive descent (top-down) approach to syntax
51
- analysis. The crate entry points for the parser are the
52
- [ ` Parser::parse_crate_mod() ` ] [ parse_crate_mod ] and [ ` Parser::parse_mod() ` ] [ parse_mod ]
53
- methods found in [ ` rustc_parse::parser::Parser ` ] . The external module parsing
54
- entry point is [ ` rustc_expand::module::parse_external_mod ` ] [ parse_external_mod ] .
55
- And the macro parser entry point is [ ` Parser::parse_nonterminal() ` ] [ parse_nonterminal ] .
56
- - Parsing is performed with a set of ` Parser ` utility methods including ` fn bump ` ,
57
- ` fn check ` , ` fn eat ` , ` fn expect ` , ` fn look_ahead ` .
58
- - Parsing is organized by the semantic construct that is being parsed. Separate
59
- ` parse_* ` methods can be found in [ ` rustc_parse ` ` parser ` ] [ rustc_parse_parser_dir ]
60
- directory. The source file name follows the construct name. For example, the
61
- following files are found in the parser:
62
- - ` expr.rs `
63
- - ` pat.rs `
64
- - ` ty.rs `
65
- - ` stmt.rs `
66
- - This naming scheme is used across many compiler stages. You will find
67
- either a file or directory with the same name across the parsing, lowering,
68
- type checking, THIR lowering, and MIR building sources.
69
- - Macro expansion, AST validation, name resolution, and early linting takes place
70
- during this stage of the compile process.
71
- - The parser uses the standard ` DiagnosticBuilder ` API for error handling, but we
72
- try to recover, parsing a superset of Rust's grammar, while also emitting an error.
73
- - ` rustc_ast::ast::{Crate, Mod, Expr, Pat, ...} ` AST nodes are returned from the parser.
35
+ The raw Rust source text is analyzed by a low-level * lexer* located in
36
+ [ ` rustc_lexer ` ] . At this stage, the source text is turned into a stream of
37
+ atomic source code units known as _ tokens_ . The lexer supports the
38
+ Unicode character encoding.
39
+
40
+ The token stream passes through a higher-level lexer located in
41
+ [ ` rustc_parse ` ] to prepare for the next stage of the compile process. The
42
+ [ ` StringReader ` ] struct is used at this stage to perform a set of validations
43
+ and turn strings into interned symbols (_ interning_ is discussed later).
44
+ [ String interning] is a way of storing only one immutable
45
+ copy of each distinct string value.
46
+
47
+ The lexer has a small interface and doesn't depend directly on the
48
+ diagnostic infrastructure in ` rustc ` . Instead it provides diagnostics as plain
49
+ data which are emitted in ` rustc_parse::lexer::mod ` as real diagnostics.
50
+ The lexer preserves full fidelity information for both IDEs and proc macros.
51
+
52
+ The * parser* [ translates the token stream from the lexer into an Abstract Syntax
53
+ Tree (AST)] [ parser ] . It uses a recursive descent (top-down) approach to syntax
54
+ analysis. The crate entry points for the parser are the
55
+ [ ` Parser::parse_crate_mod() ` ] [ parse_crate_mod ] and [ ` Parser::parse_mod() ` ] [ parse_mod ]
56
+ methods found in [ ` rustc_parse::parser::Parser ` ] . The external module parsing
57
+ entry point is [ ` rustc_expand::module::parse_external_mod ` ] [ parse_external_mod ] .
58
+ And the macro parser entry point is [ ` Parser::parse_nonterminal() ` ] [ parse_nonterminal ] .
59
+
60
+ Parsing is performed with a set of ` Parser ` utility methods including ` bump ` ,
61
+ ` check ` , ` eat ` , ` expect ` , ` look_ahead ` .
62
+
63
+ Parsing is organized by the semantic construct that is being parsed. Separate
64
+ ` parse_* ` methods can be found in [ ` rustc_parse ` ` parser ` ] [ rustc_parse_parser_dir ]
65
+ directory. The source file name follows the construct name. For example, the
66
+ following files are found in the parser:
67
+
68
+ - ` expr.rs `
69
+ - ` pat.rs `
70
+ - ` ty.rs `
71
+ - ` stmt.rs `
72
+
73
+ This naming scheme is used across many compiler stages. You will find
74
+ either a file or directory with the same name across the parsing, lowering,
75
+ type checking, THIR lowering, and MIR building sources.
76
+
77
+ Macro expansion, AST validation, name resolution, and early linting also take place
78
+ during this stage.
79
+
80
+ The parser uses the standard ` DiagnosticBuilder ` API for error handling, but we
81
+ try to recover, parsing a superset of Rust's grammar, while also emitting an error.
82
+ ` rustc_ast::ast::{Crate, Mod, Expr, Pat, ...} ` AST nodes are returned from the parser.
74
83
75
84
### HIR lowering
76
85
77
- - We then take the AST and [ convert it to High-Level Intermediate
78
- Representation (HIR)] [ hir ] . This is a compiler-friendly representation of the
79
- AST. This involves a lot of desugaring of things like loops and ` async fn ` .
80
- - We use the HIR to do [ type inference] (the process of automatic
81
- detection of the type of an expression), [ trait solving] (the process
82
- of pairing up an impl with each reference to a trait), and [ type
83
- checking] (the process of converting the types found in the HIR
84
- (` hir::Ty ` ), which represent the syntactic things that the user wrote,
85
- into the internal representation used by the compiler (` Ty<'tcx> ` ),
86
- and using that information to verify the type safety, correctness and
87
- coherence of the types used in the program).
86
+ We next take the AST and convert it to [ High-Level Intermediate
87
+ Representation (HIR)] [ hir ] , a more compiler-friendly representation of the
88
+ AST. This process called "lowering". It involves a lot of desugaring of things
89
+ like loops and ` async fn ` .
90
+
91
+ We then use the HIR to do [ * type inference* ] (the process of automatic
92
+ detection of the type of an expression), [ * trait solving* ] (the process
93
+ of pairing up an impl with each reference to a trait), and [ * type
94
+ checking* ] . Type checking is the process of converting the types found in the HIR
95
+ ([ ` hir::Ty ` ] ), which represent what the user wrote,
96
+ into the internal representation used by the compiler ([ ` Ty<'tcx> ` ] ).
97
+ That information is usedto verify the type safety, correctness and
98
+ coherence of the types used in the program.
88
99
89
100
### MIR lowering
90
101
91
- - The HIR is then [ lowered to Mid-Level Intermediate Representation (MIR)] [ mir ] .
92
- - Along the way, we construct the THIR, which is an even more desugared HIR.
93
- THIR is used for pattern and exhaustiveness checking. It is also more
94
- convenient to convert into MIR than HIR is.
95
- - The MIR is used for [ borrow checking] .
96
- - We (want to) do [ many optimizations on the MIR] [ mir-opt ] because it is still
97
- generic and that improves the code we generate later, improving compilation
98
- speed too.
99
- - MIR is a higher level (and generic) representation, so it is easier to do
100
- some optimizations at MIR level than at LLVM-IR level. For example LLVM
101
- doesn't seem to be able to optimize the pattern the [ ` simplify_try ` ] mir
102
- opt looks for.
103
- - Rust code is _ monomorphized_ , which means making copies of all the generic
104
- code with the type parameters replaced by concrete types. To do
105
- this, we need to collect a list of what concrete types to generate code for.
106
- This is called _ monomorphization collection_ .
102
+ The HIR is then [ lowered to Mid-level Intermediate Representation (MIR)] [ mir ] ,
103
+ which is used for [ borrow checking] .
104
+
105
+ Along the way, we also construct the THIR, which is an even more desugared HIR.
106
+ THIR is used for pattern and exhaustiveness checking. It is also more
107
+ convenient to convert into MIR than HIR is.
108
+
109
+ We do [ many optimizations on the MIR] [ mir-opt ] because it is still
110
+ generic and that improves the code we generate later, improving compilation
111
+ speed too.
112
+ MIR is a higher level (and generic) representation, so it is easier to do
113
+ some optimizations at MIR level than at LLVM-IR level. For example LLVM
114
+ doesn't seem to be able to optimize the pattern the [ ` simplify_try ` ] mir
115
+ opt looks for.
116
+
117
+ Rust code is _ monomorphized_ , which means making copies of all the generic
118
+ code with the type parameters replaced by concrete types. To do
119
+ this, we need to collect a list of what concrete types to generate code for.
120
+ This is called _ monomorphization collection_ and it happens at the MIR level.
107
121
108
122
### Code generation
109
123
110
- - We then begin what is vaguely called _ code generation_ or _ codegen_ .
111
- - The [ code generation stage (codegen) ] [ codegen ] is when higher level
112
- representations of source are turned into an executable binary. ` rustc `
113
- uses LLVM for code generation. The first step is to convert the MIR
114
- to LLVM Intermediate Representation (LLVM IR). This is where the MIR
115
- is actually monomorphized, according to the list we created in the
116
- previous step.
117
- - The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
118
- It then emits machine code. It is basically assembly code with additional
119
- low-level types and annotations added. (e.g. an ELF object or wasm ).
120
- - The different libraries/binaries are linked together to produce the final
121
- binary.
124
+ We then begin what is vaguely called _ code generation_ or _ codegen_ .
125
+ The [ code generation stage] [ codegen ] is when higher level
126
+ representations of source are turned into an executable binary. ` rustc `
127
+ uses LLVM for code generation. The first step is to convert the MIR
128
+ to LLVM Intermediate Representation (LLVM IR). This is where the MIR
129
+ is actually monomorphized, according to the list we created in the
130
+ previous step.
131
+ The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
132
+ It then emits machine code. It is basically assembly code with additional
133
+ low-level types and annotations added (e.g. an ELF object or WASM ).
134
+ The different libraries/binaries are then linked together to produce the final
135
+ binary.
122
136
123
137
[ String interning ] : https://en.wikipedia.org/wiki/String_interning
124
138
[ `rustc_lexer` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
@@ -129,9 +143,9 @@ we'll talk about that later.
129
143
[ `rustc_parse` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
130
144
[ parser ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
131
145
[ hir ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
132
- [ type inference ] : https://rustc-dev-guide.rust-lang.org/type-inference.html
133
- [ trait solving ] : https://rustc-dev-guide.rust-lang.org/traits/resolution.html
134
- [ type checking ] : https://rustc-dev-guide.rust-lang.org/type-checking.html
146
+ [ * type inference* ] : https://rustc-dev-guide.rust-lang.org/type-inference.html
147
+ [ * trait solving* ] : https://rustc-dev-guide.rust-lang.org/traits/resolution.html
148
+ [ * type checking* ] : https://rustc-dev-guide.rust-lang.org/type-checking.html
135
149
[ mir ] : https://rustc-dev-guide.rust-lang.org/mir/index.html
136
150
[ borrow checking ] : https://rustc-dev-guide.rust-lang.org/borrow_check.html
137
151
[ mir-opt ] : https://rustc-dev-guide.rust-lang.org/mir/optimizations.html
@@ -143,6 +157,8 @@ we'll talk about that later.
143
157
[ `rustc_parse::parser::Parser` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
144
158
[ parse_external_mod ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
145
159
[ rustc_parse_parser_dir ] : https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
160
+ [ `hir::Ty` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html
161
+ [ `Ty<'tcx>` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
146
162
147
163
## How it does it
148
164
0 commit comments