@@ -344,22 +344,49 @@ \subsection{Aggregates}
344
344
closure.}. Miri supports all common usage of all of these types. The main missing piece is to handle
345
345
\texttt {\# [repr(..)] } annotations which adjust the layout of a \rust {struct} or \rust {enum}.
346
346
347
+ \subsection {Lvalue projections }
348
+
349
+ This category includes field accesses like \rust {foo.bar}, dereferencing, accessing data in an
350
+ \rust {enum} variant, and indexing arrays. Miri supports all of these, including nested projections
351
+ such as \rust {*foo.bar[2]}.
352
+
347
353
\subsection {Control flow }
348
354
349
355
All of Rust's standard control flow features, including \rust {loop}, \rust {while}, \rust {for},
350
356
\rust {if}, \rust {if let}, \rust {while let}, \rust {match}, \rust {break}, \rust {continue}, and
351
357
\rust {return} are supported. In fact, supporting these were quite easy since the Rust compiler
352
358
reduces them all down to a comparatively smaller set of control-flow graph primitives in MIR.
353
359
354
- \subsection {Closures }
360
+ \subsection {Function calls }
361
+
362
+ As previously described, Miri supports arbitrary function calls without growing its own stack (only
363
+ its virtual call stack). It is somewhat limited by the fact that cross-crate\footnote {A crate is a
364
+ single Rust library (or executable).} calls only work for functions whose MIR is stored in crate
365
+ metadata. This is currently true for \rust {const}, generic, and \texttt {\# [inline] } functions. A
366
+ branch of the compiler could be made that stores MIR for all functions. This would be a non-issue
367
+ for a compile-time evaluator based on Miri, since it would only call \rust {const fn}s.
368
+
369
+ \subsubsection {Method calls }
370
+
371
+ Trait method calls require a bit more machinery dealing with compiler internals than normal function
372
+ calls, but Miri supports them.
373
+
374
+ \subsubsection {Closures }
355
375
356
376
Closures are like structs containing a field for each captured variable, but closures also have an
357
377
associated function. Supporting closure function calls required some extra machinery to get the
358
378
necessary information from the compiler, but it is all supported except for one edge case on my todo
359
379
list\footnote {The edge case is calling a closure that takes a reference to its captures via a
360
380
closure interface that passes the captures by value.}.
361
381
362
- \subsection {Intrinsics }
382
+ \subsubsection {Function pointers }
383
+
384
+ Function pointers are not currently supported by Miri, but there is a relatively simple way they
385
+ could be encoded using a relocation with a special reserved allocation identifier. The offset of the
386
+ relocation would determine which function it points to in a special array of functions in the
387
+ interpreter.
388
+
389
+ \subsubsection {Intrinsics }
363
390
364
391
To support unsafe code, and in particular the unsafe code used to implement Rust's standard library,
365
392
it became clear that Miri would have to support calls to compiler
@@ -375,6 +402,25 @@ \subsection{Intrinsics}
375
402
been implementing intrinsics on a case-by-case basis as I write test cases which require missing
376
403
ones, so I haven't yet exhaustively implemented them all.
377
404
405
+ \subsubsection {Generic function calls }
406
+
407
+ Miri needs special support for generic function calls since Rust is a \emph {monomorphizing }
408
+ compiler, meaning it generates a special version of each function for each distinct set of type
409
+ parameters it gets called with. Since functions in MIR are still polymorphic, Miri has to do the
410
+ same thing and substitute function type parameters into all types it encounters to get fully
411
+ concrete, monomorphized types. For example, in\ldots
412
+
413
+ \begin {minted }[autogobble]{rust}
414
+ fn some<T>(t: T) -> Option<T> { Some(t) }
415
+ \end {minted }
416
+
417
+ \ldots {} Miri needs to know how many bytes to copy from the argument to the return value, based on
418
+ the size of \rust {T}. If we call \rust {some(10i32)} Miri will execute \rust {some} knowing that
419
+ \rust {T = i32} and generate a representation for \rust {Option<i32>}.
420
+
421
+ Miri currently does this monomorphization on-demand, or lazily, unlike the Rust back-end which does
422
+ it all ahead of time.
423
+
378
424
\subsection {Heap allocations }
379
425
380
426
The next piece of the puzzle for supporting interesting programs (and the standard library) was heap
@@ -400,16 +446,58 @@ \subsection{Heap allocations}
400
446
401
447
\subsection {Destructors }
402
448
449
+ When values go out of scope that `` own'' some resource, like a heap allocation or file handle, Rust
450
+ inserts \emph {drop glue } that calls the user-defined destructor for the type if it exists, and then
451
+ drops all of the subfields. Destructors for types like \rust {Box<T>} and \rust {Vec<T>} deallocate
452
+ heap memory.
453
+
403
454
Miri doesn't yet support calling user-defined destructors, but it has most of the machinery in place
404
455
to do so already and it's next on my to-do list. There \emph {is } support for dropping \rust {Box<T>}
405
456
types, including deallocating their associated allocations. This is enough to properly execute the
406
457
dangling pointer example in \autoref {sec:deterministic }.
407
458
459
+ \subsection {Constants }
460
+
461
+ Only basic integer, boolean, string, and byte-string literals are currently supported. Evaluating
462
+ more complicated constant expressions in their current form would be a somewhat pointless exercise
463
+ for Miri. Instead, we should lower constant expressions to MIR so Miri can run them directly. (This
464
+ is precisely what would be done to use Miri as the actual constant evaluator.)
465
+
466
+ \subsection {Static variables }
467
+
468
+ While it would be invalid to write to static (i.e.\ global) variables in Miri executions, it would
469
+ probably be fine to allow reads. However, Miri doesn't currently support them and they would need
470
+ support similar to constants.
471
+
408
472
\subsection {Standard library }
409
- \blindtext
410
473
411
- \section {Unsupported }
412
- \blindtext
474
+ Throughout the implementation of the above features, I often followed this process:
475
+
476
+ \begin {enumerate }
477
+ \item Try using a feature from the standard library.
478
+ \item See where Miri runs into stuff it can't handle.
479
+ \item Fix the problem.
480
+ \item Go to 1.
481
+ \end {enumerate }
482
+
483
+ At present, Miri supports a number of major non-trivial features from the standard library along
484
+ with tons of minor features. Smart pointer types such as \rust {Box}, \rust {Rc}\footnote {Reference
485
+ counted shared pointer} and \rust {Arc}\footnote {Atomically reference-counted thread-safe shared
486
+ pointer} all seem to work. I've also tested using the shared smart pointer types with \rust {Cell}
487
+ and \rust {RefCell}\footnote {\href {https://doc.rust-lang.org/stable/std/cell/index.html}{Rust
488
+ documentation for cell types}} for internal mutability, and that works as well, although
489
+ \rust {RefCell} can't ever be borrowed twice until I implement destructor calls, since its destructor
490
+ is what releases the borrow.
491
+
492
+ But the standard library collection I spent the most time on was \rust {Vec}, the standard
493
+ dynamically-growable array type, similar to C++'s \texttt {std::vector } or Java's
494
+ \texttt {java.util.ArrayList }. In Rust, \rust {Vec} is an extremely pervasive collection, so
495
+ supporting it is a big win for supporting a larger swath of Rust programs in Miri.
496
+
497
+ See \autoref {fig:vec } for an example (working in Miri today) of initializing a \rust {Vec} with a
498
+ small amount of space on the heap and then pushing enough elements to force it to reallocate its
499
+ data array. This involves cross-crate generic function calls, unsafe code using raw pointers, heap
500
+ allocation, handling of uninitialized memory, compiler intrinsics, and more.
413
501
414
502
\begin {figure }[t]
415
503
\begin {minted }[autogobble]{rust}
@@ -441,15 +529,57 @@ \section{Unsupported}
441
529
// B: 01 02 03 __
442
530
\end {minted }
443
531
\caption {\rust {Vec} example on 32-bit little-endian}
532
+ \label {fig:vec }
533
+ \end {figure }
534
+
535
+ You can even do unsafe things with \rust {Vec} like \rust {v.set_len(10)} or
536
+ \rust {v.get_unchecked(2)}, but if you do these things carefully in a way that doesn't cause any
537
+ undefined behaviour (just like when you write unsafe code for regular Rust), then Miri can handle it
538
+ all. But if you do slip up, Miri will error out with an appropriate message (see
539
+ \autoref {fig:vec-error }).
540
+
541
+ \begin {figure }[t]
542
+ \begin {minted }[autogobble]{rust}
543
+ fn out_of_bounds() -> u8 {
544
+ let v = vec![1, 2];
545
+ let p = unsafe { v.get_unchecked(5) };
546
+ *p + 10
547
+ // ~~ error: pointer offset outside
548
+ // bounds of allocation
549
+ }
550
+
551
+ fn undefined_bytes() -> u8 {
552
+ let v = Vec::<u8>::with_capacity(10);
553
+ let p = unsafe { v.get_unchecked(5) };
554
+ *p + 10
555
+ // ~~~~~~~ error: attempted to read
556
+ // undefined bytes
557
+ }
558
+ \end {minted }
559
+ \caption {\rust {Vec} examples with undefined behaviour}
560
+ \label {fig:vec-error }
444
561
\end {figure }
445
562
446
563
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
447
564
448
- \section {Future work }
565
+ \section {Future directions }
449
566
450
567
\subsection {Finishing the implementation }
451
568
452
- \blindtext
569
+ There are a number of pressing items on my to-do list for Miri, including:
570
+
571
+ \begin {itemize }
572
+ \item Destructors and \rust {__rust_deallocate}.
573
+ \item Non-trivial casts between primitive types like integers and pointers.
574
+ \item Handling statics and global memory.
575
+ \item Reporting errors for all undefined behaviour.\footnote {\href {https://doc.rust-lang.org/reference.html\#behavior-considered-undefined}{The Rust reference on what is considered undefined behaviour}}
576
+ \item Function pointers.
577
+ \item Accounting for target machine primitive type alignment and endianness.
578
+ \item Optimizing stuff (undefined byte masks, tail-calls).
579
+ \item Benchmarking Miri vs. unoptimized Rust.
580
+ \item Various \texttt {TODO }s and \texttt {FIXME }s left in the code.
581
+ \item Getting a version of Miri into rustc for real.
582
+ \end {itemize }
453
583
454
584
\subsection {Alternative applications }
455
585
@@ -459,32 +589,60 @@ \subsection{Alternative applications}
459
589
\item A graphical or text-mode debugger that steps through MIR execution one statement at a time,
460
590
for figuring out why some compile-time execution is raising an error or simply learning how Rust
461
591
works at a low level.
462
- \item A read-eval-print-loop (REPL) for Rust may be easier to implement on top of Miri than the
463
- usual LLVM back-end.
464
- \item An extended version of Miri could be developed apart from the purpose of compile-time
465
- execution that is able to run foreign functions from C/C++ and generally have full access to the
466
- operating system. Such a version of Miri could be used to more quickly prototype changes to the
467
- Rust language that would otherwise require changes to the LLVM back-end.
468
- \item Miri might be useful for unit-testing the compiler by comparing the results of Miri's
469
- execution against the results of LLVM-compiled machine code's execution. This would help to
470
- guarantee that compile-time execution works the same as runtime execution.
592
+ \item A read-eval-print-loop (REPL) for Rust, which may be easier to implement on top of Miri than
593
+ the usual LLVM back-end.
594
+ \item An extended version of Miri developed apart from the purpose of compile-time execution that
595
+ is able to run foreign functions from C/C++ and generally have full access to the operating
596
+ system. Such a version of Miri could be used to more quickly prototype changes to the Rust
597
+ language that would otherwise require changes to the LLVM back-end.
598
+ \item Unit-testing the compiler by comparing the results of Miri's execution against the results
599
+ of LLVM-compiled machine code's execution. This would help to guarantee that compile-time
600
+ execution works the same as runtime execution.
601
+ \item Some kind of symbolic evaluator that examines multiple possible code paths at once to
602
+ determine if undefined behaviour could be observed on any of them.
471
603
\end {itemize }
472
604
473
605
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
474
606
475
607
\section {Final thoughts }
476
608
477
- % TODO(tsion): Reword this.
478
- Making Miri work was primarily an implementation problem. Writing an interpreter which models values
479
- of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
480
- unconventional techniques compared to many interpreters. Miri's execution remains safe even while
481
- simulating execution of unsafe code, which allows it to detect when unsafe code does something
482
- invalid.
609
+ Writing an interpreter which models values of varying sizes, stack and heap allocation, unsafe
610
+ memory operations, and more requires some unconventional techniques compared to typical
611
+ interpreters. However, aside from the somewhat complicated abstract memory model, making Miri work
612
+ was primarily a software engineering problem, and not a particularly tricky one. This is a testament
613
+ to MIR's suitability as an intermediate representation for Rust---removing enough unnecessary
614
+ abstraction to keep it simple. For example, Miri doesn't even need to know that there are different
615
+ kind of loops, or how to match patterns in a \rust {match} expression.
616
+
617
+ Another advantage to targeting MIR is that any new features at the syntax-level or type-level
618
+ generally require little to no change in Miri. For example, when the new `` question mark'' syntax
619
+ for error handling\footnote {
620
+ \href {https://github.com/rust-lang/rfcs/blob/master/text/0243-trait-based-exception-handling.md}
621
+ {Question mark syntax RFC}}
622
+ was added to rustc, Miri also supported it the same day with no change. When specialization\footnote {
623
+ \href {https://github.com/rust-lang/rfcs/blob/master/text/1210-impl-specialization.md}
624
+ {Specialization RFC}}
625
+ was added, Miri supported it with just minor changes to trait method lookup.
626
+
627
+ Of course, Miri also has limitations. The inability to execute FFI and inline assembly reduces the
628
+ amount of Rust programs Miri could ever execute. The good news is that in the constant evaluator,
629
+ FFI can be stubbed out in cases where it makes sense, like I did with \rust {__rust_allocate}, and
630
+ for Miri outside of the compiler it may be possible to use libffi to call C functions from the
631
+ interpreter.
632
+
633
+ In conclusion, Miri was a surprisingly effective project, and a lot of fun to implement. There were
634
+ times where I ended up supporting Rust features I didn't even intend to while I was adding support
635
+ for some other feature, due to the design of MIR collapsing features at the source level into fewer
636
+ features at the MIR level. I am excited to work with the compiler team going forward to try to make
637
+ Miri useful for constant evaluation in Rust.
483
638
484
639
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
485
640
486
641
\section {Thanks }
487
642
488
- Eduard Burtescu, Niko Matsakis, and Christopher Dutchyn.
643
+ A big thanks goes to Eduard Burtescu for writing the abstract machine specification and answering my
644
+ incessant questions on IRC, to Niko Matsakis for coming up with the idea for Miri and supporting my
645
+ desire to work with the Rust compiler, and to my research supervisor Christopher Dutchyn. Thanks
646
+ also to everyone else on the compiler team and on Mozilla IRC who helped me figure stuff out.
489
647
490
648
\end {document }
0 commit comments