Skip to content

Commit 5589ab1

Browse files
committed
Clarify immaturity of grammar, add a pile of half-baked grammar rules.
1 parent 997b29f commit 5589ab1

File tree

1 file changed

+122
-13
lines changed

1 file changed

+122
-13
lines changed

doc/rust.md

Lines changed: 122 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,15 @@ If you have suggestions to make, please try to focus them on *reductions* to
4242
the language: possible features that can be combined or omitted. We aim to
4343
keep the size and complexity of the language under control.
4444

45+
**Note on grammar:** The grammar for Rust given in this document is rough and
46+
very incomplete; only a modest number of sections have accompanying grammar
47+
rules. Formalizing the grammar accepted by the Rust parser is ongoing work,
48+
but future versions of this document will contain a complete
49+
grammar. Moreover, we hope that this grammar will be be extracted and verified
50+
as LL(1) by an automated grammar-analysis tool, and further tested against the
51+
Rust sources. Preliminary versions of this automation exist, but are not yet
52+
complete.
53+
4554
# Notation
4655

4756
Rust's grammar is defined over Unicode codepoints, each conventionally
@@ -81,13 +90,6 @@ Where:
8190

8291
This EBNF dialect should hopefully be familiar to many readers.
8392

84-
The grammar for Rust given in this document is extracted and verified as LL(1)
85-
by an automated grammar-analysis tool, and further tested against the Rust
86-
sources. The generated parser is currently *not* the one used by the Rust
87-
compiler itself, but in the future we hope to relate the two together more
88-
precisely. As of this writing they are only related by testing against
89-
existing source code.
90-
9193
## Unicode productions
9294

9395
A small number of productions in Rust's grammar permit Unicode codepoints
@@ -917,7 +919,7 @@ In this example, `nonempty_list` is a predicate---it can be used in a
917919
typestate constraint---but the auxiliary function `pure_length` is
918920
not.
919921

920-
*ToDo:* should actually define referential transparency.
922+
*TODO:* should actually define referential transparency.
921923

922924
The effect checking rules previously enumerated are a restricted set of
923925
typechecking rules meant to approximate the universe of observably
@@ -933,7 +935,7 @@ blocks, the compiler provides no static guarantee that the code will behave as
933935
expected at runtime. Rather, the programmer has an independent obligation to
934936
verify the semantics of the predicates they write.
935937

936-
*ToDo:* last two sentences are vague.
938+
*TODO:* last two sentences are vague.
937939

938940
An example of a predicate that uses an unchecked block:
939941

@@ -1327,6 +1329,12 @@ declaring a function-local item.
13271329

13281330
#### Slot declarations
13291331

1332+
~~~~~~~~{.ebnf .gram}
1333+
let_decl : "let" pat [':' type ] ? [ init ] ? ';' ;
1334+
init : [ '=' | '<-' ] expr ;
1335+
~~~~~~~~
1336+
1337+
13301338
A _slot declaration_ has one one of two forms:
13311339

13321340
* `let` `pattern` `optional-init`;
@@ -1382,6 +1390,12 @@ values.
13821390

13831391
### Record expressions
13841392

1393+
~~~~~~~~{.ebnf .gram}
1394+
rec_expr : '{' ident ':' expr
1395+
[ ',' ident ':' expr ] *
1396+
[ "with" expr ] '}'
1397+
~~~~~~~~
1398+
13851399
A _[record](#record-types) expression_ is one or more comma-separated
13861400
name-value pairs enclosed by braces. A fieldname can be any identifier
13871401
(including reserved words), and is separated from its value expression
@@ -1414,6 +1428,10 @@ let base = {x: 1, y: 2, z: 3};
14141428

14151429
### Field expressions
14161430

1431+
~~~~~~~~{.ebnf .gram}
1432+
field_expr : expr '.' expr
1433+
~~~~~~~~
1434+
14171435
A dot can be used to access a field in a record.
14181436

14191437
~~~~~~~~ {.field}
@@ -1439,6 +1457,10 @@ expression on the left of the dot.
14391457

14401458
### Vector expressions
14411459

1460+
~~~~~~~~{.ebnf .gram}
1461+
vec_expr : '[' "mutable" ? [ expr [ ',' expr ] * ] ? ']'
1462+
~~~~~~~~
1463+
14421464
A _[vector](#vector-types) expression_ is written by enclosing zero or
14431465
more comma-separated expressions of uniform type in square brackets.
14441466
The keyword `mutable` can be written after the opening bracket to
@@ -1453,6 +1475,11 @@ When no mutability is specified, the vector is immutable.
14531475

14541476
### Index expressions
14551477

1478+
~~~~~~~~{.ebnf .gram}
1479+
idx_expr : expr '[' expr ']'
1480+
~~~~~~~~
1481+
1482+
14561483
[Vector](#vector-types)-typed expressions can be indexed by writing a
14571484
square-bracket-enclosed expression (the index) after them. When the
14581485
vector is mutable, the resulting _lval_ can be assigned to.
@@ -1492,6 +1519,13 @@ operators, before the expression they apply to.
14921519

14931520
### Binary operator expressions
14941521

1522+
~~~~~~~~{.ebnf .gram}
1523+
binop_expr : expr binop expr ;
1524+
~~~~~~~~
1525+
1526+
Binary operators expressions are given in terms of
1527+
[operator precedence](#operator-precedence).
1528+
14951529
#### Arithmetic operators
14961530

14971531
Binary arithmetic expressions require both their operands to be of the
@@ -1672,10 +1706,15 @@ as
16721706
== !=
16731707
&&
16741708
||
1709+
= <- <->
16751710
~~~~
16761711

16771712
### Unary copy expressions
16781713

1714+
~~~~~~~~{.ebnf .gram}
1715+
copy_expr : "copy" expr ;
1716+
~~~~~~~~
1717+
16791718
A _unary copy expression_ consists of the unary `copy` operator applied to
16801719
some argument expression.
16811720

@@ -1684,8 +1723,8 @@ copies the resulting value, allocating any memory necessary to hold the new
16841723
copy.
16851724

16861725
[Shared boxes](#shared-box-types) (type `@`) are, as usual, shallow-copied, as
1687-
they may be cyclic. [Unique boxes](unique-box-types), [vectors](#vector-types)
1688-
and similar unique types are deep-copied.
1726+
they may be cyclic. [Unique boxes](#unique-box-types),
1727+
[vectors](#vector-types) and similar unique types are deep-copied.
16891728

16901729
Since the binary [assignment operator](#assignment-operator) `=` performs a
16911730
copy implicitly, the unary copy operator is typically only used to cause an
@@ -1707,6 +1746,10 @@ assert v[0] == 1; // Original was not modified
17071746

17081747
### Unary move expressions
17091748

1749+
~~~~~~~~{.ebnf .gram}
1750+
move_expr : "move" expr ;
1751+
~~~~~~~~
1752+
17101753
This is used to indicate that the referenced _lval_ must be moved out,
17111754
rather than copied, when evaluating this expression. It will only have
17121755
an effect when the expression is _stored_ somewhere or passed to a
@@ -1796,6 +1839,11 @@ way.
17961839

17971840
### While expressions
17981841

1842+
~~~~~~~~{.ebnf .gram}
1843+
while_expr : "while" expr '{' block '}'
1844+
| "do" '{' block '}' "while" expr ;
1845+
~~~~~~~~
1846+
17991847
A `while` expression is a loop construct. A `while` loop may be either a
18001848
simple `while` or a `do`-`while` loop.
18011849

@@ -1813,7 +1861,7 @@ loop body. If it evaluates to `false`, control exits the loop.
18131861
An example of a simple `while` expression:
18141862

18151863
~~~~
1816-
while (i < 10) {
1864+
while i < 10 {
18171865
print("hello\n");
18181866
i = i + 1;
18191867
}
@@ -1825,17 +1873,25 @@ An example of a `do`-`while` expression:
18251873
do {
18261874
print("hello\n");
18271875
i = i + 1;
1828-
} while (i < 10);
1876+
} while i < 10;
18291877
~~~~
18301878

18311879

18321880
### Break expressions
18331881

1882+
~~~~~~~~{.ebnf .gram}
1883+
break_expr : "break" ;
1884+
~~~~~~~~
1885+
18341886
Executing a `break` expression immediately terminates the innermost loop
18351887
enclosing it. It is only permitted in the body of a loop.
18361888

18371889
### Continue expressions
18381890

1891+
~~~~~~~~{.ebnf .gram}
1892+
break_expr : "cont" ;
1893+
~~~~~~~~
1894+
18391895
Evaluating a `cont` expression immediately terminates the current iteration of
18401896
the innermost loop enclosing it, returning control to the loop *head*. In the
18411897
case of a `while` loop, the head is the conditional expression controlling the
@@ -1847,6 +1903,10 @@ A `cont` expression is only permitted in the body of a loop.
18471903

18481904
### For expressions
18491905

1906+
~~~~~~~~{.ebnf .gram}
1907+
for_expr : "for" pat "in" expr '{' block '}' ;
1908+
~~~~~~~~
1909+
18501910
A _for loop_ is controlled by a vector or string. The for loop bounds-checks
18511911
the underlying sequence *once* when initiating the loop, then repeatedly
18521912
executes the loop body with the loop variable referencing the successive
@@ -1865,6 +1925,14 @@ for e: foo in v {
18651925

18661926
### If expressions
18671927

1928+
~~~~~~~~{.ebnf .gram}
1929+
if_expr : "if" expr '{' block '}'
1930+
[ "else" else_tail ] ? ;
1931+
1932+
else_tail : "else" [ if_expr
1933+
| '{' block '} ] ;
1934+
~~~~~~~~
1935+
18681936
An `if` expression is a conditional branch in program control. The form of
18691937
an `if` expression is a condition expression, followed by a consequent
18701938
block, any number of `else if` conditions and blocks, and an optional
@@ -1879,6 +1947,15 @@ then any `else` block is executed.
18791947

18801948
### Alternative expressions
18811949

1950+
~~~~~~~~{.ebnf .gram}
1951+
alt_expr : "alt" expr '{' alt_arm [ '|' alt_arm ] * '}' ;
1952+
1953+
alt_arm : alt_pat '{' block '}' ;
1954+
1955+
alt_pat : pat [ "to" pat ] ? [ "if" expr ] ;
1956+
~~~~~~~~
1957+
1958+
18821959
An `alt` expression branches on a *pattern*. The exact form of matching that
18831960
occurs depends on the pattern. Patterns consist of some combination of
18841961
literals, destructured tag constructors, records and tuples, variable binding
@@ -1971,13 +2048,21 @@ let message = alt maybe_digit {
19712048

19722049
### Fail expressions
19732050

2051+
~~~~~~~~{.ebnf .gram}
2052+
fail_expr : "fail" expr ? ;
2053+
~~~~~~~~
2054+
19742055
Evaluating a `fail` expression causes a task to enter the *failing* state. In
19752056
the *failing* state, a task unwinds its stack, destroying all frames and
19762057
freeing all resources until it reaches its entry frame, at which point it
19772058
halts execution in the *dead* state.
19782059

19792060
### Note expressions
19802061

2062+
~~~~~~~~{.ebnf .gram}
2063+
note_expr : "note" expr ;
2064+
~~~~~~~~
2065+
19812066
**Note: Note expressions are not yet supported by the compiler.**
19822067

19832068
A `note` expression has no effect during normal execution. The purpose of a
@@ -2023,6 +2108,10 @@ expression.
20232108

20242109
### Return expressions
20252110

2111+
~~~~~~~~{.ebnf .gram}
2112+
ret_expr : "ret" expr ? ;
2113+
~~~~~~~~
2114+
20262115
Return expressions are denoted with the keyword `ret`. Evaluating a `ret`
20272116
expression^[footnote{A `ret` expression is analogous to a `return` expression
20282117
in the C family.] moves its argument into the output slot of the current
@@ -2042,6 +2131,10 @@ fn max(a: int, b: int) -> int {
20422131

20432132
### Log expressions
20442133

2134+
~~~~~~~~{.ebnf .gram}
2135+
log_expr : "log" '(' level ',' expr ')' ;
2136+
~~~~~~~~
2137+
20452138
Evaluating a `log` expression may, depending on runtime configuration, cause a
20462139
value to be appended to an internal diagnostic logging buffer provided by the
20472140
runtime or emitted to a system console. Log expressions are enabled or
@@ -2094,6 +2187,10 @@ when it is changed.
20942187

20952188
### Check expressions
20962189

2190+
~~~~~~~~{.ebnf .gram}
2191+
check_expr : "check" call_expr ;
2192+
~~~~~~~~
2193+
20972194
A `check` expression connects dynamic assertions made at run-time to the
20982195
static [typestate system](#typestate-system). A `check` expression takes a
20992196
constraint to check at run-time. If the constraint holds at run-time, control
@@ -2134,13 +2231,21 @@ fn test() {
21342231

21352232
**Note: Prove expressions are not yet supported by the compiler.**
21362233

2234+
~~~~~~~~{.ebnf .gram}
2235+
prove_expr : "prove" call_expr ;
2236+
~~~~~~~~
2237+
21372238
A `prove` expression has no run-time effect. Its purpose is to statically
21382239
check (and document) that its argument constraint holds at its expression
21392240
entry point. If its argument typestate does not hold, under the typestate
21402241
algorithm, the program containing it will fail to compile.
21412242

21422243
### Claim expressions
21432244

2245+
~~~~~~~~{.ebnf .gram}
2246+
claim_expr : "claim" call_expr ;
2247+
~~~~~~~~
2248+
21442249
A `claim` expression is an unsafe variant on a `check` expression that is not
21452250
actually checked at runtime. Thus, using a `claim` implies a proof obligation
21462251
to ensure---without compiler assistance---that an assertion always holds.
@@ -2183,6 +2288,10 @@ if check even(x) {
21832288

21842289
### Assert expressions
21852290

2291+
~~~~~~~~{.ebnf .gram}
2292+
assert_expr : "assert" expr ;
2293+
~~~~~~~~
2294+
21862295
An `assert` expression is similar to a `check` expression, except
21872296
the condition may be any boolean-typed expression, and the compiler makes no
21882297
use of the knowledge that the condition holds if the program continues to

0 commit comments

Comments
 (0)