|
1 | 1 | # Lexing and Parsing
|
2 | 2 |
|
3 |
| -The very first thing the compiler does is take the program (in Unicode |
4 |
| -characters) and turn it into something the compiler can work with more |
5 |
| -conveniently than strings. This happens in two stages: Lexing and Parsing. |
| 3 | +The very first thing the compiler does is take the program (in Unicode) and |
| 4 | +transmute it into a data format the compiler can work with more conveniently |
| 5 | +than strings. This happens in two stages: Lexing and Parsing. |
6 | 6 |
|
7 |
| -Lexing takes strings and turns them into streams of [tokens]. For example, |
8 |
| -`foo.bar + buz` would be turned into the tokens `foo`, `.`, |
9 |
| -`bar`, `+`, and `buz`. The lexer lives in [`rustc_lexer`][lexer]. |
| 7 | + 1. _Lexing_ takes strings and turns them into streams of [tokens]. For |
| 8 | + example, `foo.bar + buz` would be turned into the tokens `foo`, `.`, `bar`, |
| 9 | + `+`, and `buz`. |
10 | 10 |
|
11 | 11 | [tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html
|
12 | 12 | [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
13 | 13 |
|
14 |
| -Parsing then takes streams of tokens and turns them into a structured |
15 |
| -form which is easier for the compiler to work with, usually called an [*Abstract |
16 |
| -Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory, |
17 |
| -using a `Span` to link a particular AST node back to its source text. |
| 14 | + 2. _Parsing_ takes streams of tokens and turns them into a structured form |
| 15 | + which is easier for the compiler to work with, usually called an [*Abstract |
| 16 | + Syntax Tree* (`AST`)][ast] . |
| 17 | + |
| 18 | + |
| 19 | +An `AST` mirrors the structure of a Rust program in memory, using a `Span` to |
| 20 | +link a particular `AST` node back to its source text. The `AST` is defined in |
| 21 | +[`rustc_ast`][rustc_ast], along with some definitions for tokens and token |
| 22 | +streams, data structures/`trait`s for mutating `AST`s, and shared definitions for |
| 23 | +other `AST`-related parts of the compiler (like the lexer and |
| 24 | +`macro`-expansion). |
18 | 25 |
|
19 |
| -The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for |
20 |
| -tokens and token streams, data structures/traits for mutating ASTs, and shared |
21 |
| -definitions for other AST-related parts of the compiler (like the lexer and |
22 |
| -macro-expansion). |
| 26 | +The lexer is developed in [`rustc_lexer`][lexer]. |
23 | 27 |
|
24 | 28 | The parser is defined in [`rustc_parse`][rustc_parse], along with a
|
25 | 29 | high-level interface to the lexer and some validation routines that run after
|
26 |
| -macro expansion. In particular, the [`rustc_parse::parser`][parser] contains |
| 30 | +`macro` expansion. In particular, the [`rustc_parse::parser`][parser] contains |
27 | 31 | the parser implementation.
|
28 | 32 |
|
29 |
| -The main entrypoint to the parser is via the various `parse_*` functions and others in the |
30 |
| -[parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile] |
| 33 | +The main entrypoint to the parser is via the various `parse_*` functions and others in |
| 34 | +[rustc_parse][rustc_parse]. They let you do things like turn a [`SourceFile`][sourcefile] |
31 | 35 | (e.g. the source in a single file) into a token stream, create a parser from
|
32 |
| -the token stream, and then execute the parser to get a `Crate` (the root AST |
| 36 | +the token stream, and then execute the parser to get a [`Crate`] (the root `AST` |
33 | 37 | node).
|
34 | 38 |
|
35 |
| -To minimize the amount of copying that is done, |
36 |
| -both [`StringReader`] and [`Parser`] have lifetimes which bind them to the parent `ParseSess`. |
37 |
| -This contains all the information needed while parsing, |
38 |
| -as well as the [`SourceMap`] itself. |
| 39 | +To minimize the amount of copying that is done, both [`StringReader`] and |
| 40 | +[`Parser`] have lifetimes which bind them to the parent [`ParseSess`]. This |
| 41 | +contains all the information needed while parsing, as well as the [`SourceMap`] |
| 42 | +itself. |
39 | 43 |
|
40 |
| -Note that while parsing, we may encounter macro definitions or invocations. We |
41 |
| -set these aside to be expanded (see [this chapter](./macro-expansion.md)). |
42 |
| -Expansion may itself require parsing the output of the macro, which may reveal |
43 |
| -more macros to be expanded, and so on. |
| 44 | +Note that while parsing, we may encounter `macro` definitions or invocations. We |
| 45 | +set these aside to be expanded (see [Macro Expansion](./macro-expansion.md)). |
| 46 | +Expansion itself may require parsing the output of a `macro`, which may reveal |
| 47 | +more `macro`s to be expanded, and so on. |
44 | 48 |
|
45 | 49 | ## More on Lexical Analysis
|
46 | 50 |
|
47 | 51 | Code for lexical analysis is split between two crates:
|
48 | 52 |
|
49 |
| -- `rustc_lexer` crate is responsible for breaking a `&str` into chunks |
| 53 | +- [`rustc_lexer`] crate is responsible for breaking a `&str` into chunks |
50 | 54 | constituting tokens. Although it is popular to implement lexers as generated
|
51 |
| - finite state machines, the lexer in `rustc_lexer` is hand-written. |
| 55 | + finite state machines, the lexer in [`rustc_lexer`] is hand-written. |
52 | 56 |
|
53 |
| -- [`StringReader`] integrates `rustc_lexer` with data structures specific to `rustc`. |
54 |
| - Specifically, |
55 |
| - it adds `Span` information to tokens returned by `rustc_lexer` and interns identifiers. |
| 57 | +- [`StringReader`] integrates [`rustc_lexer`] with data structures specific to |
| 58 | + `rustc`. Specifically, it adds `Span` information to tokens returned by |
| 59 | + [`rustc_lexer`] and interns identifiers. |
56 | 60 |
|
57 |
| -[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html |
58 |
| -[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html |
59 |
| -[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree |
| 61 | +[`Crate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.Crate.html |
| 62 | +[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html |
| 63 | +[`ParseSess`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.ParseSess.html |
| 64 | +[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html |
60 | 65 | [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
|
| 66 | +[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html |
61 | 67 | [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
|
62 |
| -[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html |
63 |
| -[parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html |
| 68 | +[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree |
64 | 69 | [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
|
65 |
| -[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html |
66 |
| -[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html |
67 |
| -[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html |
| 70 | +[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html |
| 71 | +[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html |
| 72 | +[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html |
68 | 73 | [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html
|
| 74 | +[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html |
0 commit comments