20
20
\begin {document }
21
21
22
22
\title {Miri: \\ \smaller {An interpreter for Rust's mid-level intermediate representation}}
23
- % \subtitle{test}
24
23
\author {Scott Olson\footnote {\href {mailto:scott@solson.me}{scott@solson.me}} \\
25
24
\smaller {Supervised by Christopher Dutchyn}}
26
- \date {April 8th , 2016}
25
+ \date {April 12th , 2016}
27
26
\maketitle
28
27
29
28
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -155,14 +154,15 @@ \subsection{Flaws}
155
154
156
155
\section {Current implementation }
157
156
158
- Roughly halfway through my time working on Miri, Rust compiler team member Eduard
159
- Burtescu\footnote {\href {https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made
160
- a post on Rust's internal
161
- forums\footnote {\href {https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's
162
- `` Rust Abstract Machine'' forum post}} about a `` Rust Abstract Machine'' specification which could
163
- be used to implement more powerful compile-time function execution, similar to what is supported by
164
- C++14's \mintinline {cpp}{constexpr} feature. After clarifying some of the details of the abstract
165
- machine's data layout with Burtescu via IRC, I started implementing it in Miri.
157
+ Roughly halfway through my time working on Miri, Eduard
158
+ Burtescu\footnote {\href {https://github.com/eddyb}{Eduard Burtescu on GitHub}} from the Rust compiler
159
+ team\footnote {\href {https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made a
160
+ post on Rust's internal forums about a `` Rust Abstract Machine''
161
+ specification\footnote {\href {https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's
162
+ `` Rust Abstract Machine'' forum post}} which could be used to implement more powerful compile-time
163
+ function execution, similar to what is supported by C++14's \mintinline {cpp}{constexpr} feature.
164
+ After clarifying some of the details of the abstract machine's data layout with Burtescu via IRC, I
165
+ started implementing it in Miri.
166
166
167
167
\subsection {Raw value representation }
168
168
@@ -224,7 +224,7 @@ \subsubsection{Undefined byte mask}
224
224
225
225
See \autoref {fig:undef } for an example undefined byte, represented by underscores. Note that there
226
226
would still be a value for the second byte in the byte array, but we don't care what it is. The
227
- bitmask would be $ 10 _2 $ , i.e. \rust {[true, false]}.
227
+ bitmask would be $ 10 _2 $ , i.e.\ \rust {[true, false]}.
228
228
229
229
\begin {figure }[hb]
230
230
\begin {minted }[autogobble]{rust}
@@ -237,12 +237,179 @@ \subsubsection{Undefined byte mask}
237
237
\label {fig:undef }
238
238
\end {figure }
239
239
240
- % TODO(tsion): Find a place for this text.
241
- % Making Miri work was primarily an implementation problem. Writing an interpreter which models values
242
- % of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
243
- % unconventional techniques compared to many interpreters. Miri's execution remains safe even while
244
- % simulating execution of unsafe code, which allows it to detect when unsafe code does something
245
- % invalid.
240
+ \subsection {Computing data layout }
241
+
242
+ Currently, the Rust compiler's data layout computations used in translation from MIR to LLVM IR are
243
+ hidden from Miri, so I do my own basic data layout computation which doesn't generally match what
244
+ translation does. In the future, the Rust compiler may be modified so that Miri can use the exact
245
+ same data layout.
246
+
247
+ Miri's data layout calculation is a relatively simple transformation from Rust types to a basic
248
+ structure with constant size values for primitives and sets of fields with offsets for aggregate
249
+ types. These layouts are cached for performance.
250
+
251
+ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
252
+
253
+ \section {Deterministic execution }
254
+ \label {sec:deterministic }
255
+
256
+ In order to be effective as a compile-time evaluator, Miri must have \emph {deterministic execution },
257
+ as explained by Burtescu in the `` Rust Abstract Machine'' post. That is, given a function and
258
+ arguments to that function, Miri should always produce identical results. This is important for
259
+ coherence in the type checker when constant evaluations are involved in types, such as for sizes of
260
+ array types:
261
+
262
+ \begin {minted }[autogobble,mathescape]{rust}
263
+ const fn get_size() -> usize { /* $ \ldots $ */ }
264
+ let array: [i32; get_size()];
265
+ \end {minted }
266
+
267
+ Since Miri allows execution of unsafe code\footnote {In fact, the distinction between safe and unsafe
268
+ doesn't exist at the MIR level.}, it is specifically designed to remain safe while interpreting
269
+ potentially unsafe code. When Miri encounters an unrecoverable error, it reports it via the Rust
270
+ compiler's usual error reporting mechanism, pointing to the part of the original code where the
271
+ error occurred. For example:
272
+
273
+ \begin {minted }[autogobble]{rust}
274
+ let b = Box::new(42);
275
+ let p: *const i32 = &*b;
276
+ drop(b);
277
+ unsafe { *p }
278
+ // ~~ error: dangling pointer
279
+ // was dereferenced
280
+ \end {minted }
281
+ \label {dangling-pointer }
282
+
283
+ There are more examples in Miri's
284
+ repository.\footnote {\href {https://github.com/tsion/miri/blob/master/test/errors.rs}{Miri's error
285
+ tests}}
286
+
287
+ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
288
+
289
+ \section {Language support }
290
+
291
+ In its current state, Miri supports a large proportion of the Rust language, with a few major
292
+ exceptions such as the lack of support for FFI\footnote {Foreign Function Interface, e.g.\ calling
293
+ functions defined in Assembly, C, or C++.}, which eliminates possibilities like reading and writing
294
+ files, user input, graphics, and more. The following is a tour of what is currently supported.
295
+
296
+ \subsection {Primitives }
297
+
298
+ Miri supports booleans and integers of various sizes and signed-ness (i.e.\ \rust {i8}, \rust {i16},
299
+ \rust {i32}, \rust {i64}, \rust {isize}, \rust {u8}, \rust {u16}, \rust {u32}, \rust {u64}, \rust {usize}),
300
+ as well as unary and boolean operations over these types. The \rust {isize} and \rust {usize} types
301
+ will be sized according to the target machine's pointer size just like in compiled Rust. The
302
+ \rust {char} and float types (\rust {f32}, \rust {f64}) are not supported yet, but there are no known
303
+ barriers to doing so.
304
+
305
+ When examining a boolean in an \rust {if} condition, Miri will report an error if it is not precisely
306
+ 0 or 1, since this is undefined behaviour in Rust. The \rust {char} type has similar restrictions to
307
+ check for once it is implemented.
308
+
309
+ \subsection {Pointers }
310
+
311
+ Both references and raw pointers are supported, with essentially no difference between them in Miri.
312
+ It is also possible to do basic pointer comparisons and math. However, a few operations are
313
+ considered errors and a few require special support.
314
+
315
+ Firstly, pointers into the same allocations may be compared for ordering, but pointers into
316
+ different allocations are considered unordered and Miri will complain if you attempt this. The
317
+ reasoning is that different allocations may have different orderings in the global address space at
318
+ runtime, making this non-deterministic. However, pointers into different allocations \emph {may } be
319
+ compared for direct equality (they are always, automatically unequal).
320
+
321
+ Finally, for things like null pointer checks, abstract pointers (the kind represented using
322
+ relocations) may be compared against pointers casted from integers (e.g.\ \rust {0 as *const i32}).
323
+ To handle these cases, Miri has a concept of `` integer pointers'' which are always unequal to
324
+ abstract pointers. Integer pointers can be compared and operated upon freely. However, note that it
325
+ is impossible to go from an integer pointer to an abstract pointer backed by a relocation. It is not
326
+ valid to dereference an integer pointer.
327
+
328
+ \subsubsection {Slice pointers }
329
+
330
+ Rust supports pointers to `` dynamically-sized types'' such as \rust {[T]} and \rust {str} which
331
+ represent arrays of indeterminate size. Pointers to such types contain an address \emph {and } the
332
+ length of the referenced array. Miri supports these fully.
333
+
334
+ \subsubsection {Trait objects }
335
+
336
+ Rust also supports pointers to `` trait objects'' which represent some type that implements a trait,
337
+ with the specific type unknown at compile-time. These are implemented using virtual dispatch with a
338
+ vtable, similar to virtual methods in C++. Miri does not currently support this at all.
339
+
340
+ \subsection {Aggregates }
341
+
342
+ Aggregates include types declared as \rust {struct} or \rust {enum} as well as tuples, arrays, and
343
+ closures\footnote {Closures are essentially structs with a field for each variable captured by the
344
+ closure.}. Miri supports all common usage of all of these types. The main missing piece is to handle
345
+ \texttt {\# [repr(..)] } annotations which adjust the layout of a \rust {struct} or \rust {enum}.
346
+
347
+ \subsection {Control flow }
348
+
349
+ All of Rust's standard control flow features, including \rust {loop}, \rust {while}, \rust {for},
350
+ \rust {if}, \rust {if let}, \rust {while let}, \rust {match}, \rust {break}, \rust {continue}, and
351
+ \rust {return} are supported. In fact, supporting these were quite easy since the Rust compiler
352
+ reduces them all down to a comparatively smaller set of control-flow graph primitives in MIR.
353
+
354
+ \subsection {Closures }
355
+
356
+ Closures are like structs containing a field for each captured variable, but closures also have an
357
+ associated function. Supporting closure function calls required some extra machinery to get the
358
+ necessary information from the compiler, but it is all supported except for one edge case on my todo
359
+ list\footnote {The edge case is calling a closure that takes a reference to its captures via a
360
+ closure interface that passes the captures by value.}.
361
+
362
+ \subsection {Intrinsics }
363
+
364
+ To support unsafe code, and in particular the unsafe code used to implement Rust's standard library,
365
+ it became clear that Miri would have to support calls to compiler
366
+ intrinsics\footnote {\href {https://doc.rust-lang.org/stable/std/intrinsics/index.html}{Rust
367
+ intrinsics documentation}}. Intrinsics are function calls which cause the Rust compiler to produce
368
+ special-purpose code instead of a regular function call. Miri simply recognizes intrinsic calls by
369
+ their unique ABI\footnote {Application Binary Interface, which defines calling conventions. Includes
370
+ `` C'' , `` Rust'' , and `` rust-intrinsic'' .} and name and runs special purpose code to handle them.
371
+
372
+ An example of an important intrinsic is \rust {size_of} which will cause Miri to write the size of
373
+ the type in question to the return value location. The Rust standard library uses intrinsics heavily
374
+ to implement various data structures, so this was a major step toward supporting them. So far, I've
375
+ been implementing intrinsics on a case-by-case basis as I write test cases which require missing
376
+ ones, so I haven't yet exhaustively implemented them all.
377
+
378
+ \subsection {Heap allocations }
379
+
380
+ The next piece of the puzzle for supporting interesting programs (and the standard library) was heap
381
+ allocations. There are two main interfaces for heap allocation in Rust, the built-in \rust {Box}
382
+ rvalue in MIR and a set of C ABI foreign functions including \rust {__rust_allocate},
383
+ \rust {__rust_reallocate}, and \rust {__rust_deallocate}. These correspond approximately to
384
+ \mintinline {c}{malloc}, \mintinline {c}{realloc}, and \mintinline {c}{free} in C.
385
+
386
+ The \rust {Box} rvalue allocates enough space for a single value of a given type. This was easy to
387
+ support in Miri. It simply creates a new abstract allocation in the same manner as for
388
+ stack-allocated values, since there's no major difference between them in Miri.
389
+
390
+ The allocator functions, which are used to implement things like Rust's standard \rust {Vec<T>} type,
391
+ were a bit trickier. Rust declares them as \rust {extern "C" fn} so that different allocator
392
+ libraries can be linked in at the user's option. Since Miri doesn't actually support FFI and we want
393
+ full control of allocations for safety, Miri `` cheats'' and recognizes these allocator function in
394
+ essentially the same way it recognizes compiler intrinsics. Then, a call to \rust {__rust_allocate}
395
+ simply creates another abstract allocation with the requested size and \rust {__rust_reallocate}
396
+ grows one.
397
+
398
+ In the future, Miri should also track which allocations came from \rust {__rust_allocate} so it can
399
+ reject reallocate or deallocate calls on stack allocations.
400
+
401
+ \subsection {Destructors }
402
+
403
+ Miri doesn't yet support calling user-defined destructors, but it has most of the machinery in place
404
+ to do so already and it's next on my to-do list. There \emph {is } support for dropping \rust {Box<T>}
405
+ types, including deallocating their associated allocations. This is enough to properly execute the
406
+ dangling pointer example in \autoref {sec:deterministic }.
407
+
408
+ \subsection {Standard library }
409
+ \blindtext
410
+
411
+ \section {Unsupported }
412
+ \blindtext
246
413
247
414
\begin {figure }[t]
248
415
\begin {minted }[autogobble]{rust}
@@ -280,6 +447,12 @@ \subsubsection{Undefined byte mask}
280
447
281
448
\section {Future work }
282
449
450
+ \subsection {Finishing the implementation }
451
+
452
+ \blindtext
453
+
454
+ \subsection {Alternative applications }
455
+
283
456
Other possible uses for Miri include:
284
457
285
458
\begin {itemize }
@@ -299,6 +472,17 @@ \section{Future work}
299
472
300
473
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
301
474
475
+ \section {Final thoughts }
476
+
477
+ % TODO(tsion): Reword this.
478
+ Making Miri work was primarily an implementation problem. Writing an interpreter which models values
479
+ of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
480
+ unconventional techniques compared to many interpreters. Miri's execution remains safe even while
481
+ simulating execution of unsafe code, which allows it to detect when unsafe code does something
482
+ invalid.
483
+
484
+ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
485
+
302
486
\section {Thanks }
303
487
304
488
Eduard Burtescu, Niko Matsakis, and Christopher Dutchyn.
0 commit comments