Skip to content

Commit da8ccaf

Browse files
committed
FFI and Rust
1 parent 078517d commit da8ccaf

File tree

1 file changed

+265
-0
lines changed

1 file changed

+265
-0
lines changed

_posts/2015-04-24-FFI-and-Rust.md

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
---
2+
layout: post
3+
title: "FFI and Rust"
4+
author: Alex Crichton
5+
description: "Zero-cost and safe FFI in Rust"
6+
---
7+
8+
9+
Rust's quest for world domination was never destined to happen overnight, so
10+
Rust needs to be able to interoperate with the existing world just as easily
11+
as it talks to itself. To solve this problem, **Rust lets you communicate with C
12+
APIs at no extra cost while providing strong safety guarantees**.
13+
14+
This is also referred to as Rust's foreign function interface (FFI) and is the
15+
method by which Rust communicates with other programming languages. Following
16+
Rust's design principles, this is a **zero cost abstraction** where function
17+
calls between Rust and C have identical performance to C function calls. FFI
18+
bindings can also leverage language features such as ownership and borrowing to
19+
provide a **safe interface**.
20+
21+
In this post we'll explore how to encapsulate unsafe FFI calls to C in safe,
22+
zero-cost abstractions by looking at some examples of interacting with C.
23+
Working with C is, however, just an example, as we'll also see how Rust can
24+
easily talk to languages like Python and Ruby just as seamlessly as C.
25+
26+
### Talking to C
27+
28+
First, let's start with an example of calling C code from Rust and then
29+
demonstrate that Rust imposes no additional overhead. Starting off simple,
30+
here's a C program which will simply double all the input it's given:
31+
32+
```c
33+
int double_input(int input) {
34+
return input * 2;
35+
}
36+
```
37+
38+
To call this from Rust, one would write this program:
39+
40+
```rust
41+
extern crate libc;
42+
43+
extern {
44+
fn double_input(input: libc::c_int) -> libc::c_int;
45+
}
46+
47+
fn main() {
48+
let input = 4;
49+
let output = unsafe { double_input(input) };
50+
println!("{} * 2 = {}", input, output);
51+
}
52+
```
53+
54+
And that's it! You can try this out for yourself by [checking out the code on
55+
GitHub][rust2c] and running `cargo run` from that directory. At the source level
56+
we can see that there's no burden in calling an external function, and we'll see
57+
soon that the generated code indeed has no overhead. There are, however, a few
58+
subtle aspects of this Rust program so let's cover each piece in detail.
59+
60+
[rust2c]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/rust-to-c
61+
62+
First up we see `extern crate libc`. [This crate][libc] provides many useful
63+
type definitions for FFI bindings when talking with C, and it is necessary
64+
to ensure that both C and Rust agree on the types crossing the language
65+
boundary.
66+
67+
[libc]: https://crates.io/crates/libc
68+
69+
This leads us nicely into the next part of the program:
70+
71+
```rust
72+
extern {
73+
fn double_input(input: libc::c_int) -> libc::c_int;
74+
}
75+
```
76+
77+
In Rust this is a **declaration** of an externally available function. You can
78+
think of this along the lines of a C header file. Here's where the compiler
79+
learns about the inputs and outputs of the function, and you can see above that
80+
this matches our definition in C. Next up we have the main body of the program:
81+
82+
```rust
83+
fn main() {
84+
let input = 4;
85+
let output = unsafe { double_input(input) };
86+
println!("{} * 2 = {}", input, output);
87+
}
88+
```
89+
90+
We see one of the crucial aspects of FFI in Rust here, the `unsafe` block. The
91+
compiler knows nothing about the implementation of `double_input`, so it must
92+
assume that memory unsafety *could* happen in this scenario. This may seem
93+
limiting, but Rust has just the right set of tools to allow consumers to not
94+
worry about `unsafe` (more on this in a moment).
95+
96+
Now that we've seen how to call a C function from Rust, let's see if we can
97+
verify this claim of zero overhead. Almost all programming languages can call
98+
into C one way or another, but it often comes at a cost with runtime type
99+
conversions or perhaps some language runtime juggling. To get a handle on what
100+
Rust is doing, let's go straight to the assembly code of the above `main`
101+
function's call to `double_input`:
102+
103+
```
104+
mov $0x4,%edi
105+
callq 3bc30 <double_input>
106+
```
107+
108+
And as before, that's it! Here we can see that calling a C function from Rust
109+
involves precisely one call instruction after moving the arguments into place,
110+
exactly the same cost as it would be in C.
111+
112+
### Safe Abstractions
113+
114+
One of Rust's core design principles is its emphasis on ownership, and FFI is no
115+
exception here. When binding a C library in Rust you not only have the benefit
116+
of 0 overhead, but you are also able to make it *safer* than C can! Bindings
117+
can leverage the ownership and borrowing principles in Rust to codify comments
118+
typically found in a C header about how its API should be used.
119+
120+
For example, consider a C library for parsing a tarball. This library will
121+
expose functions to read the contents of each file in the tarball, probably
122+
something along the lines of:
123+
124+
```c
125+
// Gets the data for a file in the tarball at the given index, returning NULL if
126+
// it does not exist. The `size` pointer is filled in with the size of the file
127+
// if successful.
128+
const char *tarball_file_data(tarball_t *tarball, unsigned index, size_t *size);
129+
```
130+
131+
This function is implicitly making assumptions about how it can be used,
132+
however, by assuming that the `char*` pointer returned cannot outlive the input
133+
tarball. When bound in Rust, this API might look like this instead:
134+
135+
```rust
136+
pub struct Tarball { raw: *mut tarball_t }
137+
138+
impl Tarball {
139+
pub fn file(&self, index: u32) -> Option<&[u8]> {
140+
unsafe {
141+
let mut size = 0;
142+
let data = tarball_file_data(self.raw, index as libc::c_uint,
143+
&mut size);
144+
if data.is_null() {
145+
None
146+
} else {
147+
Some(slice::from_raw_parts(data as *const u8, size as usize))
148+
}
149+
}
150+
}
151+
}
152+
```
153+
154+
Here the `*mut tarball_t` pointer is *owned by* a `Tarball`, so we already have
155+
rich knowledge about the lifetime of the resource. Additionally, the `file`
156+
method returns a **borrowed slice** whose lifetime is connected to the same
157+
lifetime as the source tarball itself. This is Rust's way of indicating that the
158+
returned data cannot outlive the tarball, statically preventing bugs that may be
159+
encountered when just using C.
160+
161+
A key aspect of the Rust binding here is that it is a safe function! Although it
162+
has an `unsafe` implementation (due to calling an FFI function), this interface
163+
is safe to call and will not cause tough-to-track-down segfaults. And don't
164+
forget, all of this is coming at 0 cost as the raw types in C are representable
165+
in Rust with no extra allocations or overhead.
166+
167+
### Talking to Rust
168+
169+
A major feature of Rust is that it does not have a garbage collector or
170+
runtime, and one of the benefits of this is that Rust can be called from C with
171+
no setup at all. This means that the zero overhead FFI not only applies when
172+
Rust calls into C, but also when C calls into Rust!
173+
174+
Let's take the example above, but reverse the roles of each language. As before,
175+
all the code below is [available on GitHub][c2rust]. First we'll start off with
176+
our Rust code:
177+
178+
[c2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/c-to-rust
179+
180+
```rust
181+
#[no_mangle]
182+
pub extern fn double_input(input: i32) -> i32 {
183+
input * 2
184+
}
185+
```
186+
187+
As with the Rust code before, there's not a whole lot here but there are some
188+
subtle aspects in play. First off we've got our function definition with a
189+
`#[no_mangle]` attribute. This instructs the compiler to not mangle the symbol
190+
name for the function `double_input`. Rust employs name mangling similar to C++
191+
to ensure that libraries do not clash with one another, and this attributes
192+
means that you don't have to guess a symbol name like
193+
`double_input::h485dee7f568bebafeaa` from C.
194+
195+
Next we've got our function definition, and the most interesting part about
196+
this is the keyword `extern`. This is a specialized form of specifying the [ABI
197+
for a function][abi-fn] which enables the function to be compatible with a C
198+
function call.
199+
200+
[abi-fn]: http://doc.rust-lang.org/reference.html#extern-functions
201+
202+
Finally, if you [take a look at the `Cargo.toml`][cargo-toml] you'll see that
203+
this library is not compiled as a normal Rust library (rlib) but instead as a
204+
static archive which Rust calls a 'staticlib'. This enables all the relevant
205+
Rust code to be linked statically into the C program we're about to produce.
206+
207+
[cargo-toml]: https://github.com/alexcrichton/rust-ffi-examples/blob/master/c-to-rust/Cargo.toml#L8
208+
209+
Now that we've got our Rust library squared away, let's write our C program
210+
which will call Rust.
211+
212+
```c
213+
#include <stdint.h>
214+
#include <stdio.h>
215+
216+
extern int32_t double_input(int32_t input);
217+
218+
int main() {
219+
int input = 4;
220+
int output = double_input(input);
221+
printf("%d * 2 = %d\n", input, output);
222+
return 0;
223+
}
224+
```
225+
226+
Here we can see that C, like Rust, needs to declare the `double_input` function
227+
that Rust defined. Other than that though everything is ready to go! If you run
228+
`make` from the [directory on GitHub][c2rust] you'll see these examples getting
229+
compiled and linked together and the final executable should run and print
230+
`4 * 2 = 8`.
231+
232+
Rust's lack of a garbage collector and runtime enables this seamless transition
233+
from C to Rust. The external C code does not need to perform any setup on Rust's
234+
behalf, making the transition that much cheaper.
235+
236+
### Beyond C
237+
238+
Up to now we've seen how FFI in Rust has zero overhead and how we can use Rust's
239+
concept of ownership to write safe bindings to C libraries. If you're not using
240+
C, however, you're still in luck! These features of Rust enable it to also be
241+
called from [Python][py2rust], [Ruby][rb2rust], [Javascript][js2rust], and many
242+
more languages.
243+
244+
[py2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/python-to-rust
245+
[rb2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/ruby-to-rust
246+
[js2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/node-to-rust
247+
248+
A common desire for writing C code in these languages is to speed up some
249+
component of a library or application that's performance critical. With the
250+
features of Rust we've seen here, however, Rust is just as suitable for this
251+
sort of usage. One of Rust's first production users,
252+
[Skylight](https://www.skylight.io), was able to improve the performance and
253+
memory usage of their data collection agent almost instantly by just using Rust,
254+
and the Rust code is all published as a Ruby gem.
255+
256+
Moving from a language like Python and Ruby down to C to optimize performance is
257+
often quite difficult as it's tough to ensure that the program won't crash in a
258+
difficult-to-debug way. Rust, however, not only brings zero cost FFI, but *also*
259+
the same safety guarantees the original source language, enabling this sort of
260+
optimization to happen even more frequently!
261+
262+
FFI is just one of many tools in the toolbox of Rust, but it's a key component
263+
to Rust's adoption as it allows Rust to seamlessly integrate with existing code
264+
bases today. I'm personally quite excited to see the benefits of Rust reach as
265+
many projects as possible!

0 commit comments

Comments
 (0)