Skip to content

Commit f3d0e18

Browse files
committed
report: Fill in most of the language support section, plus data layout and determinism.
1 parent cb6a1e9 commit f3d0e18

File tree

1 file changed

+201
-17
lines changed

1 file changed

+201
-17
lines changed

tex/report/miri-report.tex

Lines changed: 201 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,9 @@
2020
\begin{document}
2121

2222
\title{Miri: \\ \smaller{An interpreter for Rust's mid-level intermediate representation}}
23-
% \subtitle{test}
2423
\author{Scott Olson\footnote{\href{mailto:scott@solson.me}{scott@solson.me}} \\
2524
\smaller{Supervised by Christopher Dutchyn}}
26-
\date{April 8th, 2016}
25+
\date{April 12th, 2016}
2726
\maketitle
2827

2928
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -155,14 +154,15 @@ \subsection{Flaws}
155154

156155
\section{Current implementation}
157156

158-
Roughly halfway through my time working on Miri, Rust compiler team member Eduard
159-
Burtescu\footnote{\href{https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made
160-
a post on Rust's internal
161-
forums\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's
162-
``Rust Abstract Machine'' forum post}} about a ``Rust Abstract Machine'' specification which could
163-
be used to implement more powerful compile-time function execution, similar to what is supported by
164-
C++14's \mintinline{cpp}{constexpr} feature. After clarifying some of the details of the abstract
165-
machine's data layout with Burtescu via IRC, I started implementing it in Miri.
157+
Roughly halfway through my time working on Miri, Eduard
158+
Burtescu\footnote{\href{https://github.com/eddyb}{Eduard Burtescu on GitHub}} from the Rust compiler
159+
team\footnote{\href{https://www.rust-lang.org/team.html\#Compiler}{The Rust compiler team}} made a
160+
post on Rust's internal forums about a ``Rust Abstract Machine''
161+
specification\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's
162+
``Rust Abstract Machine'' forum post}} which could be used to implement more powerful compile-time
163+
function execution, similar to what is supported by C++14's \mintinline{cpp}{constexpr} feature.
164+
After clarifying some of the details of the abstract machine's data layout with Burtescu via IRC, I
165+
started implementing it in Miri.
166166

167167
\subsection{Raw value representation}
168168

@@ -224,7 +224,7 @@ \subsubsection{Undefined byte mask}
224224

225225
See \autoref{fig:undef} for an example undefined byte, represented by underscores. Note that there
226226
would still be a value for the second byte in the byte array, but we don't care what it is. The
227-
bitmask would be $10_2$, i.e. \rust{[true, false]}.
227+
bitmask would be $10_2$, i.e.\ \rust{[true, false]}.
228228

229229
\begin{figure}[hb]
230230
\begin{minted}[autogobble]{rust}
@@ -237,12 +237,179 @@ \subsubsection{Undefined byte mask}
237237
\label{fig:undef}
238238
\end{figure}
239239

240-
% TODO(tsion): Find a place for this text.
241-
% Making Miri work was primarily an implementation problem. Writing an interpreter which models values
242-
% of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
243-
% unconventional techniques compared to many interpreters. Miri's execution remains safe even while
244-
% simulating execution of unsafe code, which allows it to detect when unsafe code does something
245-
% invalid.
240+
\subsection{Computing data layout}
241+
242+
Currently, the Rust compiler's data layout computations used in translation from MIR to LLVM IR are
243+
hidden from Miri, so I do my own basic data layout computation which doesn't generally match what
244+
translation does. In the future, the Rust compiler may be modified so that Miri can use the exact
245+
same data layout.
246+
247+
Miri's data layout calculation is a relatively simple transformation from Rust types to a basic
248+
structure with constant size values for primitives and sets of fields with offsets for aggregate
249+
types. These layouts are cached for performance.
250+
251+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
252+
253+
\section{Deterministic execution}
254+
\label{sec:deterministic}
255+
256+
In order to be effective as a compile-time evaluator, Miri must have \emph{deterministic execution},
257+
as explained by Burtescu in the ``Rust Abstract Machine'' post. That is, given a function and
258+
arguments to that function, Miri should always produce identical results. This is important for
259+
coherence in the type checker when constant evaluations are involved in types, such as for sizes of
260+
array types:
261+
262+
\begin{minted}[autogobble,mathescape]{rust}
263+
const fn get_size() -> usize { /* $\ldots$ */ }
264+
let array: [i32; get_size()];
265+
\end{minted}
266+
267+
Since Miri allows execution of unsafe code\footnote{In fact, the distinction between safe and unsafe
268+
doesn't exist at the MIR level.}, it is specifically designed to remain safe while interpreting
269+
potentially unsafe code. When Miri encounters an unrecoverable error, it reports it via the Rust
270+
compiler's usual error reporting mechanism, pointing to the part of the original code where the
271+
error occurred. For example:
272+
273+
\begin{minted}[autogobble]{rust}
274+
let b = Box::new(42);
275+
let p: *const i32 = &*b;
276+
drop(b);
277+
unsafe { *p }
278+
// ~~ error: dangling pointer
279+
// was dereferenced
280+
\end{minted}
281+
\label{dangling-pointer}
282+
283+
There are more examples in Miri's
284+
repository.\footnote{\href{https://github.com/tsion/miri/blob/master/test/errors.rs}{Miri's error
285+
tests}}
286+
287+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
288+
289+
\section{Language support}
290+
291+
In its current state, Miri supports a large proportion of the Rust language, with a few major
292+
exceptions such as the lack of support for FFI\footnote{Foreign Function Interface, e.g.\ calling
293+
functions defined in Assembly, C, or C++.}, which eliminates possibilities like reading and writing
294+
files, user input, graphics, and more. The following is a tour of what is currently supported.
295+
296+
\subsection{Primitives}
297+
298+
Miri supports booleans and integers of various sizes and signed-ness (i.e.\ \rust{i8}, \rust{i16},
299+
\rust{i32}, \rust{i64}, \rust{isize}, \rust{u8}, \rust{u16}, \rust{u32}, \rust{u64}, \rust{usize}),
300+
as well as unary and boolean operations over these types. The \rust{isize} and \rust{usize} types
301+
will be sized according to the target machine's pointer size just like in compiled Rust. The
302+
\rust{char} and float types (\rust{f32}, \rust{f64}) are not supported yet, but there are no known
303+
barriers to doing so.
304+
305+
When examining a boolean in an \rust{if} condition, Miri will report an error if it is not precisely
306+
0 or 1, since this is undefined behaviour in Rust. The \rust{char} type has similar restrictions to
307+
check for once it is implemented.
308+
309+
\subsection{Pointers}
310+
311+
Both references and raw pointers are supported, with essentially no difference between them in Miri.
312+
It is also possible to do basic pointer comparisons and math. However, a few operations are
313+
considered errors and a few require special support.
314+
315+
Firstly, pointers into the same allocations may be compared for ordering, but pointers into
316+
different allocations are considered unordered and Miri will complain if you attempt this. The
317+
reasoning is that different allocations may have different orderings in the global address space at
318+
runtime, making this non-deterministic. However, pointers into different allocations \emph{may} be
319+
compared for direct equality (they are always, automatically unequal).
320+
321+
Finally, for things like null pointer checks, abstract pointers (the kind represented using
322+
relocations) may be compared against pointers casted from integers (e.g.\ \rust{0 as *const i32}).
323+
To handle these cases, Miri has a concept of ``integer pointers'' which are always unequal to
324+
abstract pointers. Integer pointers can be compared and operated upon freely. However, note that it
325+
is impossible to go from an integer pointer to an abstract pointer backed by a relocation. It is not
326+
valid to dereference an integer pointer.
327+
328+
\subsubsection{Slice pointers}
329+
330+
Rust supports pointers to ``dynamically-sized types'' such as \rust{[T]} and \rust{str} which
331+
represent arrays of indeterminate size. Pointers to such types contain an address \emph{and} the
332+
length of the referenced array. Miri supports these fully.
333+
334+
\subsubsection{Trait objects}
335+
336+
Rust also supports pointers to ``trait objects'' which represent some type that implements a trait,
337+
with the specific type unknown at compile-time. These are implemented using virtual dispatch with a
338+
vtable, similar to virtual methods in C++. Miri does not currently support this at all.
339+
340+
\subsection{Aggregates}
341+
342+
Aggregates include types declared as \rust{struct} or \rust{enum} as well as tuples, arrays, and
343+
closures\footnote{Closures are essentially structs with a field for each variable captured by the
344+
closure.}. Miri supports all common usage of all of these types. The main missing piece is to handle
345+
\texttt{\#[repr(..)]} annotations which adjust the layout of a \rust{struct} or \rust{enum}.
346+
347+
\subsection{Control flow}
348+
349+
All of Rust's standard control flow features, including \rust{loop}, \rust{while}, \rust{for},
350+
\rust{if}, \rust{if let}, \rust{while let}, \rust{match}, \rust{break}, \rust{continue}, and
351+
\rust{return} are supported. In fact, supporting these were quite easy since the Rust compiler
352+
reduces them all down to a comparatively smaller set of control-flow graph primitives in MIR.
353+
354+
\subsection{Closures}
355+
356+
Closures are like structs containing a field for each captured variable, but closures also have an
357+
associated function. Supporting closure function calls required some extra machinery to get the
358+
necessary information from the compiler, but it is all supported except for one edge case on my todo
359+
list\footnote{The edge case is calling a closure that takes a reference to its captures via a
360+
closure interface that passes the captures by value.}.
361+
362+
\subsection{Intrinsics}
363+
364+
To support unsafe code, and in particular the unsafe code used to implement Rust's standard library,
365+
it became clear that Miri would have to support calls to compiler
366+
intrinsics\footnote{\href{https://doc.rust-lang.org/stable/std/intrinsics/index.html}{Rust
367+
intrinsics documentation}}. Intrinsics are function calls which cause the Rust compiler to produce
368+
special-purpose code instead of a regular function call. Miri simply recognizes intrinsic calls by
369+
their unique ABI\footnote{Application Binary Interface, which defines calling conventions. Includes
370+
``C'', ``Rust'', and ``rust-intrinsic''.} and name and runs special purpose code to handle them.
371+
372+
An example of an important intrinsic is \rust{size_of} which will cause Miri to write the size of
373+
the type in question to the return value location. The Rust standard library uses intrinsics heavily
374+
to implement various data structures, so this was a major step toward supporting them. So far, I've
375+
been implementing intrinsics on a case-by-case basis as I write test cases which require missing
376+
ones, so I haven't yet exhaustively implemented them all.
377+
378+
\subsection{Heap allocations}
379+
380+
The next piece of the puzzle for supporting interesting programs (and the standard library) was heap
381+
allocations. There are two main interfaces for heap allocation in Rust, the built-in \rust{Box}
382+
rvalue in MIR and a set of C ABI foreign functions including \rust{__rust_allocate},
383+
\rust{__rust_reallocate}, and \rust{__rust_deallocate}. These correspond approximately to
384+
\mintinline{c}{malloc}, \mintinline{c}{realloc}, and \mintinline{c}{free} in C.
385+
386+
The \rust{Box} rvalue allocates enough space for a single value of a given type. This was easy to
387+
support in Miri. It simply creates a new abstract allocation in the same manner as for
388+
stack-allocated values, since there's no major difference between them in Miri.
389+
390+
The allocator functions, which are used to implement things like Rust's standard \rust{Vec<T>} type,
391+
were a bit trickier. Rust declares them as \rust{extern "C" fn} so that different allocator
392+
libraries can be linked in at the user's option. Since Miri doesn't actually support FFI and we want
393+
full control of allocations for safety, Miri ``cheats'' and recognizes these allocator function in
394+
essentially the same way it recognizes compiler intrinsics. Then, a call to \rust{__rust_allocate}
395+
simply creates another abstract allocation with the requested size and \rust{__rust_reallocate}
396+
grows one.
397+
398+
In the future, Miri should also track which allocations came from \rust{__rust_allocate} so it can
399+
reject reallocate or deallocate calls on stack allocations.
400+
401+
\subsection{Destructors}
402+
403+
Miri doesn't yet support calling user-defined destructors, but it has most of the machinery in place
404+
to do so already and it's next on my to-do list. There \emph{is} support for dropping \rust{Box<T>}
405+
types, including deallocating their associated allocations. This is enough to properly execute the
406+
dangling pointer example in \autoref{sec:deterministic}.
407+
408+
\subsection{Standard library}
409+
\blindtext
410+
411+
\section{Unsupported}
412+
\blindtext
246413

247414
\begin{figure}[t]
248415
\begin{minted}[autogobble]{rust}
@@ -280,6 +447,12 @@ \subsubsection{Undefined byte mask}
280447

281448
\section{Future work}
282449

450+
\subsection{Finishing the implementation}
451+
452+
\blindtext
453+
454+
\subsection{Alternative applications}
455+
283456
Other possible uses for Miri include:
284457

285458
\begin{itemize}
@@ -299,6 +472,17 @@ \section{Future work}
299472

300473
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
301474

475+
\section{Final thoughts}
476+
477+
% TODO(tsion): Reword this.
478+
Making Miri work was primarily an implementation problem. Writing an interpreter which models values
479+
of varying sizes, stack and heap allocation, unsafe memory operations, and more requires some
480+
unconventional techniques compared to many interpreters. Miri's execution remains safe even while
481+
simulating execution of unsafe code, which allows it to detect when unsafe code does something
482+
invalid.
483+
484+
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
485+
302486
\section{Thanks}
303487

304488
Eduard Burtescu, Niko Matsakis, and Christopher Dutchyn.

0 commit comments

Comments
 (0)