Skip to content

Commit 64dad2c

Browse files
committed
Cleanup lifetime guide
Clean pointers guide
1 parent 4462687 commit 64dad2c

File tree

2 files changed

+92
-304
lines changed

2 files changed

+92
-304
lines changed

src/doc/guide-lifetimes.md

Lines changed: 69 additions & 156 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,12 @@
33
# Introduction
44

55
References are one of the more flexible and powerful tools available in
6-
Rust. A reference can point anywhere: into the managed or exchange
7-
heap, into the stack, and even into the interior of another data structure. A
8-
reference is as flexible as a C pointer or C++ reference. However,
9-
unlike C and C++ compilers, the Rust compiler includes special static checks
10-
that ensure that programs use references safely. Another advantage of
11-
references is that they are invisible to the garbage collector, so
12-
working with references helps reduce the overhead of automatic memory
13-
management.
6+
Rust. They can point anywhere: into the heap, stack, and even into the
7+
interior of another data structure. A reference is as flexible as a C pointer
8+
or C++ reference.
9+
10+
Unlike C and C++ compilers, the Rust compiler includes special static
11+
checks that ensure that programs use references safely.
1412

1513
Despite their complete safety, a reference's representation at runtime
1614
is the same as that of an ordinary pointer in a C program. They introduce zero
@@ -26,7 +24,7 @@ through several examples.
2624

2725
References, sometimes known as *borrowed pointers*, are only valid for
2826
a limited duration. References never claim any kind of ownership
29-
over the data that they point to: instead, they are used for cases
27+
over the data that they point to, instead, they are used for cases
3028
where you would like to use data for a short time.
3129

3230
As an example, consider a simple struct type `Point`:
@@ -36,27 +34,23 @@ struct Point {x: f64, y: f64}
3634
~~~
3735

3836
We can use this simple definition to allocate points in many different ways. For
39-
example, in this code, each of these three local variables contains a
40-
point, but allocated in a different place:
37+
example, in this code, each of these local variables contains a point,
38+
but allocated in a different place:
4139

4240
~~~
4341
# struct Point {x: f64, y: f64}
44-
let on_the_stack : Point = Point {x: 3.0, y: 4.0};
45-
let managed_box : @Point = @Point {x: 5.0, y: 1.0};
46-
let owned_box : Box<Point> = box Point {x: 7.0, y: 9.0};
42+
let on_the_stack : Point = Point {x: 3.0, y: 4.0};
43+
let on_the_heap : Box<Point> = box Point {x: 7.0, y: 9.0};
4744
~~~
4845

4946
Suppose we wanted to write a procedure that computed the distance between any
50-
two points, no matter where they were stored. For example, we might like to
51-
compute the distance between `on_the_stack` and `managed_box`, or between
52-
`managed_box` and `owned_box`. One option is to define a function that takes
53-
two arguments of type `Point`—that is, it takes the points by value. But if we
54-
define it this way, calling the function will cause the points to be
55-
copied. For points, this is probably not so bad, but often copies are
47+
two points, no matter where they were stored. One option is to define a function
48+
that takes two arguments of type `Point`—that is, it takes the points __by value__.
49+
But if we define it this way, calling the function will cause the points __to be
50+
copied__. For points, this is probably not so bad, but often copies are
5651
expensive. Worse, if the data type contains mutable fields, copying can change
57-
the semantics of your program in unexpected ways. So we'd like to define a
58-
function that takes the points by pointer. We can use references to do
59-
this:
52+
the semantics of your program in unexpected ways. So we'd like to define
53+
a function that takes the points just as a __reference__/__borrowed pointer__.
6054

6155
~~~
6256
# struct Point {x: f64, y: f64}
@@ -68,30 +62,27 @@ fn compute_distance(p1: &Point, p2: &Point) -> f64 {
6862
}
6963
~~~
7064

71-
Now we can call `compute_distance()` in various ways:
65+
Now we can call `compute_distance()`
7266

7367
~~~
7468
# struct Point {x: f64, y: f64}
7569
# let on_the_stack : Point = Point{x: 3.0, y: 4.0};
76-
# let managed_box : @Point = @Point{x: 5.0, y: 1.0};
77-
# let owned_box : Box<Point> = box Point{x: 7.0, y: 9.0};
70+
# let on_the_heap : Box<Point> = box Point{x: 7.0, y: 9.0};
7871
# fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
79-
compute_distance(&on_the_stack, managed_box);
80-
compute_distance(managed_box, owned_box);
72+
compute_distance(&on_the_stack, on_the_heap);
8173
~~~
8274

8375
Here, the `&` operator takes the address of the variable
8476
`on_the_stack`; this is because `on_the_stack` has the type `Point`
8577
(that is, a struct value) and we have to take its address to get a
8678
value. We also call this _borrowing_ the local variable
87-
`on_the_stack`, because we have created an alias: that is, another
79+
`on_the_stack`, because we have created __an alias__: that is, another
8880
name for the same data.
8981

90-
In contrast, we can pass the boxes `managed_box` and `owned_box` to
91-
`compute_distance` directly. The compiler automatically converts a box like
92-
`@Point` or `~Point` to a reference like `&Point`. This is another form
93-
of borrowing: in this case, the caller lends the contents of the managed or
94-
owned box to the callee.
82+
In contrast, we can pass `on_the_heap` to `compute_distance` directly.
83+
The compiler automatically converts a box like `Box<Point>` to a reference like
84+
`&Point`. This is another form of borrowing: in this case, the caller lends
85+
the contents of the box to the callee.
9586

9687
Whenever a caller lends data to a callee, there are some limitations on what
9788
the caller can do with the original. For example, if the contents of a
@@ -134,10 +125,10 @@ let on_the_stack2 : &Point = &tmp;
134125

135126
# Taking the address of fields
136127

137-
As in C, the `&` operator is not limited to taking the address of
128+
The `&` operator is not limited to taking the address of
138129
local variables. It can also take the address of fields or
139130
individual array elements. For example, consider this type definition
140-
for `rectangle`:
131+
for `Rectangle`:
141132

142133
~~~
143134
struct Point {x: f64, y: f64} // as before
@@ -153,9 +144,7 @@ Now, as before, we can define rectangles in a few different ways:
153144
# struct Rectangle {origin: Point, size: Size}
154145
let rect_stack = &Rectangle {origin: Point {x: 1.0, y: 2.0},
155146
size: Size {w: 3.0, h: 4.0}};
156-
let rect_managed = @Rectangle {origin: Point {x: 3.0, y: 4.0},
157-
size: Size {w: 3.0, h: 4.0}};
158-
let rect_owned = box Rectangle {origin: Point {x: 5.0, y: 6.0},
147+
let rect_heap = box Rectangle {origin: Point {x: 5.0, y: 6.0},
159148
size: Size {w: 3.0, h: 4.0}};
160149
~~~
161150

@@ -167,109 +156,29 @@ operator. For example, I could write:
167156
# struct Size {w: f64, h: f64} // as before
168157
# struct Rectangle {origin: Point, size: Size}
169158
# let rect_stack = &Rectangle {origin: Point {x: 1.0, y: 2.0}, size: Size {w: 3.0, h: 4.0}};
170-
# let rect_managed = @Rectangle {origin: Point {x: 3.0, y: 4.0}, size: Size {w: 3.0, h: 4.0}};
171-
# let rect_owned = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}};
159+
# let rect_heap = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}};
172160
# fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 }
173-
compute_distance(&rect_stack.origin, &rect_managed.origin);
161+
compute_distance(&rect_stack.origin, &rect_heap.origin);
174162
~~~
175163

176164
which would borrow the field `origin` from the rectangle on the stack
177-
as well as from the managed box, and then compute the distance between them.
165+
as well as from the owned box, and then compute the distance between them.
178166

179-
# Borrowing managed boxes and rooting
167+
# Lifetimes
180168

181-
We’ve seen a few examples so far of borrowing heap boxes, both managed
182-
and owned. Up till this point, we’ve glossed over issues of
183-
safety. As stated in the introduction, at runtime a reference
184-
is simply a pointer, nothing more. Therefore, avoiding C's problems
185-
with dangling pointers requires a compile-time safety check.
169+
We’ve seen a few examples of borrowing data. Up till this point, we’ve glossed
170+
over issues of safety. As stated in the introduction, at runtime a reference
171+
is simply a pointer, nothing more. Therefore, avoiding C's problems with
172+
dangling pointers requires a compile-time safety check.
186173

187-
The basis for the check is the notion of _lifetimes_. A lifetime is a
174+
The basis for the check is the notion of __lifetimes__. A lifetime is a
188175
static approximation of the span of execution during which the pointer
189176
is valid: it always corresponds to some expression or block within the
190-
program. Code inside that expression can use the pointer without
191-
restrictions. But if the pointer escapes from that expression (for
192-
example, if the expression contains an assignment expression that
193-
assigns the pointer to a mutable field of a data structure with a
194-
broader scope than the pointer itself), the compiler reports an
195-
error. We'll be discussing lifetimes more in the examples to come, and
196-
a more thorough introduction is also available.
197-
198-
When the `&` operator creates a reference, the compiler must
199-
ensure that the pointer remains valid for its entire
200-
lifetime. Sometimes this is relatively easy, such as when taking the
201-
address of a local variable or a field that is stored on the stack:
202-
203-
~~~
204-
struct X { f: int }
205-
fn example1() {
206-
let mut x = X { f: 3 };
207-
let y = &mut x.f; // -+ L
208-
// ... // |
209-
} // -+
210-
~~~
211-
212-
Here, the lifetime of the reference `y` is simply L, the
213-
remainder of the function body. The compiler need not do any other
214-
work to prove that code will not free `x.f`. This is true even if the
215-
code mutates `x`.
216-
217-
The situation gets more complex when borrowing data inside heap boxes:
218-
219-
~~~
220-
# struct X { f: int }
221-
fn example2() {
222-
let mut x = @X { f: 3 };
223-
let y = &x.f; // -+ L
224-
// ... // |
225-
} // -+
226-
~~~
227-
228-
In this example, the value `x` is a heap box, and `y` is therefore a
229-
pointer into that heap box. Again the lifetime of `y` is L, the
230-
remainder of the function body. But there is a crucial difference:
231-
suppose `x` were to be reassigned during the lifetime L? If the
232-
compiler isn't careful, the managed box could become *unrooted*, and
233-
would therefore be subject to garbage collection. A heap box that is
234-
unrooted is one such that no pointer values in the heap point to
235-
it. It would violate memory safety for the box that was originally
236-
assigned to `x` to be garbage-collected, since a non-heap
237-
pointer *`y`* still points into it.
238-
239-
> *Note:* Our current implementation implements the garbage collector
240-
> using reference counting and cycle detection.
241-
242-
For this reason, whenever an `&` expression borrows the interior of a
243-
managed box stored in a mutable location, the compiler inserts a
244-
temporary that ensures that the managed box remains live for the
245-
entire lifetime. So, the above example would be compiled as if it were
246-
written
247-
248-
~~~
249-
# struct X { f: int }
250-
fn example2() {
251-
let mut x = @X {f: 3};
252-
let x1 = x;
253-
let y = &x1.f; // -+ L
254-
// ... // |
255-
} // -+
256-
~~~
257-
258-
Now if `x` is reassigned, the pointer `y` will still remain valid. This
259-
process is called *rooting*.
260-
261-
# Borrowing owned boxes
262-
263-
The previous example demonstrated *rooting*, the process by which the
264-
compiler ensures that managed boxes remain live for the duration of a
265-
borrow. Unfortunately, rooting does not work for borrows of owned
266-
boxes, because it is not possible to have two references to an owned
267-
box.
268-
269-
For owned boxes, therefore, the compiler will only allow a borrow *if
270-
the compiler can guarantee that the owned box will not be reassigned
271-
or moved for the lifetime of the pointer*. This does not necessarily
272-
mean that the owned box is stored in immutable memory. For example,
177+
program.
178+
179+
The compiler will only allow a borrow *if it can guarantee that the data will
180+
not be reassigned or moved for the lifetime of the pointer*. This does not
181+
necessarily mean that the data is stored in immutable memory. For example,
273182
the following function is legal:
274183

275184
~~~
@@ -294,7 +203,7 @@ and `x` is declared as mutable. However, the compiler can prove that
294203
and in fact is mutated later in the function.
295204

296205
It may not be clear why we are so concerned about mutating a borrowed
297-
variable. The reason is that the runtime system frees any owned box
206+
variable. The reason is that the runtime system frees any box
298207
_as soon as its owning reference changes or goes out of
299208
scope_. Therefore, a program like this is illegal (and would be
300209
rejected by the compiler):
@@ -337,31 +246,34 @@ Once the reassignment occurs, the memory will look like this:
337246
+---------+
338247
~~~
339248

340-
Here you can see that the variable `y` still points at the old box,
341-
which has been freed.
249+
Here you can see that the variable `y` still points at the old `f`
250+
property of Foo, which has been freed.
342251

343252
In fact, the compiler can apply the same kind of reasoning to any
344-
memory that is _(uniquely) owned by the stack frame_. So we could
253+
memory that is (uniquely) owned by the stack frame. So we could
345254
modify the previous example to introduce additional owned pointers
346255
and structs, and the compiler will still be able to detect possible
347-
mutations:
256+
mutations. This time, we'll use an analogy to illustrate the concept.
348257

349258
~~~ {.ignore}
350259
fn example3() -> int {
351-
struct R { g: int }
352-
struct S { f: Box<R> }
260+
struct House { owner: Box<Person> }
261+
struct Person { age: int }
353262
354-
let mut x = box S {f: box R {g: 3}};
355-
let y = &x.f.g;
356-
x = box S {f: box R {g: 4}}; // Error reported here.
357-
x.f = box R {g: 5}; // Error reported here.
358-
*y
263+
let mut house = box House {
264+
owner: box Person {age: 30}
265+
};
266+
267+
let owner_age = &house.owner.age;
268+
house = box House {owner: box Person {age: 40}}; // Error reported here.
269+
house.owner = box Person {age: 50}; // Error reported here.
270+
*owner_age
359271
}
360272
~~~
361273

362-
In this case, two errors are reported, one when the variable `x` is
363-
modified and another when `x.f` is modified. Either modification would
364-
invalidate the pointer `y`.
274+
In this case, two errors are reported, one when the variable `house` is
275+
modified and another when `house.owner` is modified. Either modification would
276+
invalidate the pointer `owner_age`.
365277

366278
# Borrowing and enums
367279

@@ -412,7 +324,7 @@ circle constant][tau] and not that dreadfully outdated notion of pi).
412324

413325
The second match is more interesting. Here we match against a
414326
rectangle and extract its size: but rather than copy the `size`
415-
struct, we use a by-reference binding to create a pointer to it. In
327+
struct, we use a __by-reference binding__ to create a pointer to it. In
416328
other words, a pattern binding like `ref size` binds the name `size`
417329
to a pointer of type `&size` into the _interior of the enum_.
418330

@@ -526,12 +438,12 @@ time one that does not compile:
526438

527439
~~~ {.ignore}
528440
struct Point {x: f64, y: f64}
529-
fn get_x_sh(p: @Point) -> &f64 {
441+
fn get_x_sh(p: &Point) -> &f64 {
530442
&p.x // Error reported here
531443
}
532444
~~~
533445

534-
Here, the function `get_x_sh()` takes a managed box as input and
446+
Here, the function `get_x_sh()` takes a reference as input and
535447
returns a reference. As before, the lifetime of the reference
536448
that will be returned is a parameter (specified by the
537449
caller). That means that `get_x_sh()` promises to return a reference
@@ -540,17 +452,18 @@ subtly different from the first example, which promised to return a
540452
pointer that was valid for as long as its pointer argument was valid.
541453

542454
Within `get_x_sh()`, we see the expression `&p.x` which takes the
543-
address of a field of a managed box. The presence of this expression
544-
implies that the compiler must guarantee that, so long as the
545-
resulting pointer is valid, the managed box will not be reclaimed by
546-
the garbage collector. But recall that `get_x_sh()` also promised to
455+
address of a field of a Point. The presence of this expression
456+
implies that the compiler must guarantee that , so long as the
457+
resulting pointer is valid, the original Point won't be moved or changed.
458+
459+
But recall that `get_x_sh()` also promised to
547460
return a pointer that was valid for as long as the caller wanted it to
548461
be. Clearly, `get_x_sh()` is not in a position to make both of these
549462
guarantees; in fact, it cannot guarantee that the pointer will remain
550463
valid at all once it returns, as the parameter `p` may or may not be
551464
live in the caller. Therefore, the compiler will report an error here.
552465

553-
In general, if you borrow a managed (or owned) box to create a
466+
In general, if you borrow a structs or boxes to create a
554467
reference, it will only be valid within the function
555468
and cannot be returned. This is why the typical way to return references
556469
is to take references as input (the only other case in

0 commit comments

Comments
 (0)