diff --git a/src/doc/grammar.md b/src/doc/grammar.md index 78432b6a96593..43cab585f36a1 100644 --- a/src/doc/grammar.md +++ b/src/doc/grammar.md @@ -1,817 +1,3 @@ % Grammar -# Introduction - -This document is the primary reference for the Rust programming language grammar. It -provides only one kind of material: - - - Chapters that formally define the language grammar. - -This document does not serve as an introduction to the language. Background -familiarity with the language is assumed. A separate [guide] is available to -help acquire such background. - -This document also does not serve as a reference to the [standard] library -included in the language distribution. Those libraries are documented -separately by extracting documentation attributes from their source code. Many -of the features that one might expect to be language features are library -features in Rust, so what you're looking for may be there, not here. - -[guide]: guide.html -[standard]: std/index.html - -# Notation - -Rust's grammar is defined over Unicode codepoints, each conventionally denoted -`U+XXXX`, for 4 or more hexadecimal digits `X`. _Most_ of Rust's grammar is -confined to the ASCII range of Unicode, and is described in this document by a -dialect of Extended Backus-Naur Form (EBNF), specifically a dialect of EBNF -supported by common automated LL(k) parsing tools such as `llgen`, rather than -the dialect given in ISO 14977. The dialect can be defined self-referentially -as follows: - -```antlr -grammar : rule + ; -rule : nonterminal ':' productionrule ';' ; -productionrule : production [ '|' production ] * ; -production : term * ; -term : element repeats ; -element : LITERAL | IDENTIFIER | '[' productionrule ']' ; -repeats : [ '*' | '+' ] NUMBER ? | NUMBER ? | '?' ; -``` - -Where: - -- Whitespace in the grammar is ignored. -- Square brackets are used to group rules. -- `LITERAL` is a single printable ASCII character, or an escaped hexadecimal - ASCII code of the form `\xQQ`, in single quotes, denoting the corresponding - Unicode codepoint `U+00QQ`. -- `IDENTIFIER` is a nonempty string of ASCII letters and underscores. -- The `repeat` forms apply to the adjacent `element`, and are as follows: - - `?` means zero or one repetition - - `*` means zero or more repetitions - - `+` means one or more repetitions - - NUMBER trailing a repeat symbol gives a maximum repetition count - - NUMBER on its own gives an exact repetition count - -This EBNF dialect should hopefully be familiar to many readers. - -## Unicode productions - -A few productions in Rust's grammar permit Unicode codepoints outside the ASCII -range. We define these productions in terms of character properties specified -in the Unicode standard, rather than in terms of ASCII-range codepoints. The -section [Special Unicode Productions](#special-unicode-productions) lists these -productions. - -## String table productions - -Some rules in the grammar — notably [unary -operators](#unary-operator-expressions), [binary -operators](#binary-operator-expressions), and [keywords](#keywords) — are -given in a simplified form: as a listing of a table of unquoted, printable -whitespace-separated strings. These cases form a subset of the rules regarding -the [token](#tokens) rule, and are assumed to be the result of a -lexical-analysis phase feeding the parser, driven by a DFA, operating over the -disjunction of all such string table entries. - -When such a string enclosed in double-quotes (`"`) occurs inside the grammar, -it is an implicit reference to a single member of such a string table -production. See [tokens](#tokens) for more information. - -# Lexical structure - -## Input format - -Rust input is interpreted as a sequence of Unicode codepoints encoded in UTF-8. -Most Rust grammar rules are defined in terms of printable ASCII-range -codepoints, but a small number are defined in terms of Unicode properties or -explicit codepoint lists. [^inputformat] - -[^inputformat]: Substitute definitions for the special Unicode productions are - provided to the grammar verifier, restricted to ASCII range, when verifying the - grammar in this document. - -## Special Unicode Productions - -The following productions in the Rust grammar are defined in terms of Unicode -properties: `ident`, `non_null`, `non_eol`, `non_single_quote` and -`non_double_quote`. - -### Identifiers - -The `ident` production is any nonempty Unicode[^non_ascii_idents] string of -the following form: - -[^non_ascii_idents]: Non-ASCII characters in identifiers are currently feature - gated. This is expected to improve soon. - -- The first character has property `XID_start` -- The remaining characters have property `XID_continue` - -that does _not_ occur in the set of [keywords](#keywords). - -> **Note**: `XID_start` and `XID_continue` as character properties cover the -> character ranges used to form the more familiar C and Java language-family -> identifiers. - -### Delimiter-restricted productions - -Some productions are defined by exclusion of particular Unicode characters: - -- `non_null` is any single Unicode character aside from `U+0000` (null) -- `non_eol` is `non_null` restricted to exclude `U+000A` (`'\n'`) -- `non_single_quote` is `non_null` restricted to exclude `U+0027` (`'`) -- `non_double_quote` is `non_null` restricted to exclude `U+0022` (`"`) - -## Comments - -```antlr -comment : block_comment | line_comment ; -block_comment : "/*" block_comment_body * "*/" ; -block_comment_body : [block_comment | character] * ; -line_comment : "//" non_eol * ; -``` - -**FIXME:** add doc grammar? - -## Whitespace - -```antlr -whitespace_char : '\x20' | '\x09' | '\x0a' | '\x0d' ; -whitespace : [ whitespace_char | comment ] + ; -``` - -## Tokens - -```antlr -simple_token : keyword | unop | binop ; -token : simple_token | ident | literal | symbol | whitespace token ; -``` - -### Keywords - -

- -| | | | | | -|----------|----------|----------|----------|----------| -| _ | abstract | alignof | as | become | -| box | break | const | continue | crate | -| do | else | enum | extern | false | -| final | fn | for | if | impl | -| in | let | loop | macro | match | -| mod | move | mut | offsetof | override | -| priv | proc | pub | pure | ref | -| return | Self | self | sizeof | static | -| struct | super | trait | true | type | -| typeof | unsafe | unsized | use | virtual | -| where | while | yield | | | - - -Each of these keywords has special meaning in its grammar, and all of them are -excluded from the `ident` rule. - -Not all of these keywords are used by the language. Some of them were used -before Rust 1.0, and were left reserved once their implementations were -removed. Some of them were reserved before 1.0 to make space for possible -future features. - -### Literals - -```antlr -lit_suffix : ident; -literal : [ string_lit | char_lit | byte_string_lit | byte_lit | num_lit | bool_lit ] lit_suffix ?; -``` - -The optional `lit_suffix` production is only used for certain numeric literals, -but is reserved for future extension. That is, the above gives the lexical -grammar, but a Rust parser will reject everything but the 12 special cases -mentioned in [Number literals](reference/tokens.html#number-literals) in the -reference. - -#### Character and string literals - -```antlr -char_lit : '\x27' char_body '\x27' ; -string_lit : '"' string_body * '"' | 'r' raw_string ; - -char_body : non_single_quote - | '\x5c' [ '\x27' | common_escape | unicode_escape ] ; - -string_body : non_double_quote - | '\x5c' [ '\x22' | common_escape | unicode_escape ] ; -raw_string : '"' raw_string_body '"' | '#' raw_string '#' ; - -common_escape : '\x5c' - | 'n' | 'r' | 't' | '0' - | 'x' hex_digit 2 -unicode_escape : 'u' '{' hex_digit+ 6 '}'; - -hex_digit : 'a' | 'b' | 'c' | 'd' | 'e' | 'f' - | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' - | dec_digit ; -oct_digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ; -dec_digit : '0' | nonzero_dec ; -nonzero_dec: '1' | '2' | '3' | '4' - | '5' | '6' | '7' | '8' | '9' ; -``` - -#### Byte and byte string literals - -```antlr -byte_lit : "b\x27" byte_body '\x27' ; -byte_string_lit : "b\x22" string_body * '\x22' | "br" raw_byte_string ; - -byte_body : ascii_non_single_quote - | '\x5c' [ '\x27' | common_escape ] ; - -byte_string_body : ascii_non_double_quote - | '\x5c' [ '\x22' | common_escape ] ; -raw_byte_string : '"' raw_byte_string_body '"' | '#' raw_byte_string '#' ; - -``` - -#### Number literals - -```antlr -num_lit : nonzero_dec [ dec_digit | '_' ] * float_suffix ? - | '0' [ [ dec_digit | '_' ] * float_suffix ? - | 'b' [ '1' | '0' | '_' ] + - | 'o' [ oct_digit | '_' ] + - | 'x' [ hex_digit | '_' ] + ] ; - -float_suffix : [ exponent | '.' dec_lit exponent ? ] ? ; - -exponent : ['E' | 'e'] ['-' | '+' ] ? dec_lit ; -dec_lit : [ dec_digit | '_' ] + ; -``` - -#### Boolean literals - -```antlr -bool_lit : [ "true" | "false" ] ; -``` - -The two values of the boolean type are written `true` and `false`. - -### Symbols - -```antlr -symbol : "::" | "->" - | '#' | '[' | ']' | '(' | ')' | '{' | '}' - | ',' | ';' ; -``` - -Symbols are a general class of printable [tokens](#tokens) that play structural -roles in a variety of grammar productions. They are cataloged here for -completeness as the set of remaining miscellaneous printable tokens that do not -otherwise appear as [unary operators](#unary-operator-expressions), [binary -operators](#binary-operator-expressions), or [keywords](#keywords). - -## Paths - -```antlr -expr_path : [ "::" ] ident [ "::" expr_path_tail ] + ; -expr_path_tail : '<' type_expr [ ',' type_expr ] + '>' - | expr_path ; - -type_path : ident [ type_path_tail ] + ; -type_path_tail : '<' type_expr [ ',' type_expr ] + '>' - | "::" type_path ; -``` - -# Syntax extensions - -## Macros - -```antlr -expr_macro_rules : "macro_rules" '!' ident '(' macro_rule * ')' ';' - | "macro_rules" '!' ident '{' macro_rule * '}' ; -macro_rule : '(' matcher * ')' "=>" '(' transcriber * ')' ';' ; -matcher : '(' matcher * ')' | '[' matcher * ']' - | '{' matcher * '}' | '$' ident ':' ident - | '$' '(' matcher * ')' sep_token? [ '*' | '+' ] - | non_special_token ; -transcriber : '(' transcriber * ')' | '[' transcriber * ']' - | '{' transcriber * '}' | '$' ident - | '$' '(' transcriber * ')' sep_token? [ '*' | '+' ] - | non_special_token ; -``` - -# Crates and source files - -**FIXME:** grammar? What production covers #![crate_id = "foo"] ? - -# Items and attributes - -**FIXME:** grammar? - -## Items - -```antlr -item : vis ? mod_item | fn_item | type_item | struct_item | enum_item - | const_item | static_item | trait_item | impl_item | extern_block_item ; -``` - -### Type Parameters - -**FIXME:** grammar? - -### Modules - -```antlr -mod_item : "mod" ident ( ';' | '{' mod '}' ); -mod : [ view_item | item ] * ; -``` - -#### View items - -```antlr -view_item : extern_crate_decl | use_decl ';' ; -``` - -##### Extern crate declarations - -```antlr -extern_crate_decl : "extern" "crate" crate_name -crate_name: ident | ( ident "as" ident ) -``` - -##### Use declarations - -```antlr -use_decl : vis ? "use" [ path "as" ident - | path_glob ] ; - -path_glob : ident [ "::" [ path_glob - | '*' ] ] ? - | '{' path_item [ ',' path_item ] * '}' ; - -path_item : ident | "self" ; -``` - -### Functions - -**FIXME:** grammar? - -#### Generic functions - -**FIXME:** grammar? - -#### Unsafety - -**FIXME:** grammar? - -##### Unsafe functions - -**FIXME:** grammar? - -##### Unsafe blocks - -**FIXME:** grammar? - -#### Diverging functions - -**FIXME:** grammar? - -### Type definitions - -**FIXME:** grammar? - -### Structures - -**FIXME:** grammar? - -### Enumerations - -**FIXME:** grammar? - -### Constant items - -```antlr -const_item : "const" ident ':' type '=' expr ';' ; -``` - -### Static items - -```antlr -static_item : "static" ident ':' type '=' expr ';' ; -``` - -#### Mutable statics - -**FIXME:** grammar? - -### Traits - -**FIXME:** grammar? - -### Implementations - -**FIXME:** grammar? - -### External blocks - -```antlr -extern_block_item : "extern" '{' extern_block '}' ; -extern_block : [ foreign_fn ] * ; -``` - -## Visibility and Privacy - -```antlr -vis : "pub" ; -``` -### Re-exporting and Visibility - -See [Use declarations](#use-declarations). - -## Attributes - -```antlr -attribute : '#' '!' ? '[' meta_item ']' ; -meta_item : ident [ '=' literal - | '(' meta_seq ')' ] ? ; -meta_seq : meta_item [ ',' meta_seq ] ? ; -``` - -# Statements and expressions - -## Statements - -```antlr -stmt : decl_stmt | expr_stmt | ';' ; -``` - -### Declaration statements - -```antlr -decl_stmt : item | let_decl ; -``` - -#### Item declarations - -See [Items](#items). - -#### Variable declarations - -```antlr -let_decl : "let" pat [':' type ] ? [ init ] ? ';' ; -init : [ '=' ] expr ; -``` - -### Expression statements - -```antlr -expr_stmt : expr ';' ; -``` - -## Expressions - -```antlr -expr : literal | path | tuple_expr | unit_expr | struct_expr - | block_expr | method_call_expr | field_expr | array_expr - | idx_expr | range_expr | unop_expr | binop_expr - | paren_expr | call_expr | lambda_expr | while_expr - | loop_expr | break_expr | continue_expr | for_expr - | if_expr | match_expr | if_let_expr | while_let_expr - | return_expr ; -``` - -#### Lvalues, rvalues and temporaries - -**FIXME:** grammar? - -#### Moved and copied types - -**FIXME:** Do we want to capture this in the grammar as different productions? - -### Literal expressions - -See [Literals](#literals). - -### Path expressions - -See [Paths](#paths). - -### Tuple expressions - -```antlr -tuple_expr : '(' [ expr [ ',' expr ] * | expr ',' ] ? ')' ; -``` - -### Unit expressions - -```antlr -unit_expr : "()" ; -``` - -### Structure expressions - -```antlr -struct_expr_field_init : ident | ident ':' expr ; -struct_expr : expr_path '{' struct_expr_field_init - [ ',' struct_expr_field_init ] * - [ ".." expr ] '}' | - expr_path '(' expr - [ ',' expr ] * ')' | - expr_path ; -``` - -### Block expressions - -```antlr -block_expr : '{' [ stmt | item ] * - [ expr ] '}' ; -``` - -### Method-call expressions - -```antlr -method_call_expr : expr '.' ident paren_expr_list ; -``` - -### Field expressions - -```antlr -field_expr : expr '.' ident ; -``` - -### Array expressions - -```antlr -array_expr : '[' "mut" ? array_elems? ']' ; - -array_elems : [expr [',' expr]*] | [expr ';' expr] ; -``` - -### Index expressions - -```antlr -idx_expr : expr '[' expr ']' ; -``` - -### Range expressions - -```antlr -range_expr : expr ".." expr | - expr ".." | - ".." expr | - ".." ; -``` - -### Unary operator expressions - -```antlr -unop_expr : unop expr ; -unop : '-' | '*' | '!' ; -``` - -### Binary operator expressions - -```antlr -binop_expr : expr binop expr | type_cast_expr - | assignment_expr | compound_assignment_expr ; -binop : arith_op | bitwise_op | lazy_bool_op | comp_op -``` - -#### Arithmetic operators - -```antlr -arith_op : '+' | '-' | '*' | '/' | '%' ; -``` - -#### Bitwise operators - -```antlr -bitwise_op : '&' | '|' | '^' | "<<" | ">>" ; -``` - -#### Lazy boolean operators - -```antlr -lazy_bool_op : "&&" | "||" ; -``` - -#### Comparison operators - -```antlr -comp_op : "==" | "!=" | '<' | '>' | "<=" | ">=" ; -``` - -#### Type cast expressions - -```antlr -type_cast_expr : value "as" type ; -``` - -#### Assignment expressions - -```antlr -assignment_expr : expr '=' expr ; -``` - -#### Compound assignment expressions - -```antlr -compound_assignment_expr : expr [ arith_op | bitwise_op ] '=' expr ; -``` - -### Grouped expressions - -```antlr -paren_expr : '(' expr ')' ; -``` - -### Call expressions - -```antlr -expr_list : [ expr [ ',' expr ]* ] ? ; -paren_expr_list : '(' expr_list ')' ; -call_expr : expr paren_expr_list ; -``` - -### Lambda expressions - -```antlr -ident_list : [ ident [ ',' ident ]* ] ? ; -lambda_expr : '|' ident_list '|' expr ; -``` - -### While loops - -```antlr -while_expr : [ lifetime ':' ] ? "while" no_struct_literal_expr '{' block '}' ; -``` - -### Infinite loops - -```antlr -loop_expr : [ lifetime ':' ] ? "loop" '{' block '}'; -``` - -### Break expressions - -```antlr -break_expr : "break" [ lifetime ] ?; -``` - -### Continue expressions - -```antlr -continue_expr : "continue" [ lifetime ] ?; -``` - -### For expressions - -```antlr -for_expr : [ lifetime ':' ] ? "for" pat "in" no_struct_literal_expr '{' block '}' ; -``` - -### If expressions - -```antlr -if_expr : "if" no_struct_literal_expr '{' block '}' - else_tail ? ; - -else_tail : "else" [ if_expr | if_let_expr - | '{' block '}' ] ; -``` - -### Match expressions - -```antlr -match_expr : "match" no_struct_literal_expr '{' match_arm * '}' ; - -match_arm : attribute * match_pat "=>" [ expr "," | '{' block '}' ] ; - -match_pat : pat [ '|' pat ] * [ "if" expr ] ? ; -``` - -### If let expressions - -```antlr -if_let_expr : "if" "let" pat '=' expr '{' block '}' - else_tail ? ; -``` - -### While let loops - -```antlr -while_let_expr : [ lifetime ':' ] ? "while" "let" pat '=' expr '{' block '}' ; -``` - -### Return expressions - -```antlr -return_expr : "return" expr ? ; -``` - -# Type system - -**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already? - -## Types - -### Primitive types - -**FIXME:** grammar? - -#### Machine types - -**FIXME:** grammar? - -#### Machine-dependent integer types - -**FIXME:** grammar? - -### Textual types - -**FIXME:** grammar? - -### Tuple types - -**FIXME:** grammar? - -### Array, and Slice types - -**FIXME:** grammar? - -### Structure types - -**FIXME:** grammar? - -### Enumerated types - -**FIXME:** grammar? - -### Pointer types - -**FIXME:** grammar? - -### Function types - -**FIXME:** grammar? - -### Closure types - -```antlr -closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|' - [ ':' bound-list ] [ '->' type ] -lifetime-list := lifetime | lifetime ',' lifetime-list -arg-list := ident ':' type | ident ':' type ',' arg-list -``` - -### Never type -An empty type - -```antlr -never_type : "!" ; -``` - -### Object types - -**FIXME:** grammar? - -### Type parameters - -**FIXME:** grammar? - -### Type parameter bounds - -```antlr -bound-list := bound | bound '+' bound-list '+' ? -bound := ty_bound | lt_bound -lt_bound := lifetime -ty_bound := ty_bound_noparen | (ty_bound_noparen) -ty_bound_noparen := [?] [ for ] simple_path -``` - -### Self types - -**FIXME:** grammar? - -## Type kinds - -**FIXME:** this is probably not relevant to the grammar... - -# Memory and concurrency models - -**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already? - -## Memory model - -### Memory allocation and lifetime - -### Memory ownership - -### Variables - -### Boxes - -## Threads - -### Communication between threads - -### Thread lifecycle +Please see [the Reference](reference/index.html) for Rust's grammar.