|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "FFI and Rust" |
| 4 | +author: Alex Crichton |
| 5 | +description: "Zero-cost and safe FFI in Rust" |
| 6 | +--- |
| 7 | + |
| 8 | + |
| 9 | +Rust's quest for world domination was never destined to happen overnight, so |
| 10 | +Rust needs to be able to interoperate with the existing world just as easily |
| 11 | +as it talks to itself. To solve this problem, **Rust lets you communicate with C |
| 12 | +APIs at no extra cost while providing strong safety guarantees**. |
| 13 | + |
| 14 | +This is also referred to as Rust's foreign function interface (FFI) and is the |
| 15 | +method by which Rust communicates with other programming languages. Following |
| 16 | +Rust's design principles, this is a **zero cost abstraction** where function |
| 17 | +calls between Rust and C have identical performance to C function calls. FFI |
| 18 | +bindings can also leverage language features such as ownership and borrowing to |
| 19 | +provide a **safe interface**. |
| 20 | + |
| 21 | +In this post we'll explore how to encapsulate unsafe FFI calls to C in safe, |
| 22 | +zero-cost abstractions by looking at some examples of interacting with C. |
| 23 | +Working with C is, however, just an example, as we'll also see how Rust can |
| 24 | +easily talk to languages like Python and Ruby just as seamlessly as C. |
| 25 | + |
| 26 | +### Talking to C |
| 27 | + |
| 28 | +First, let's start with an example of calling C code from Rust and then |
| 29 | +demonstrate that Rust imposes no additional overhead. Starting off simple, |
| 30 | +here's a C program which will simply double all the input it's given: |
| 31 | + |
| 32 | +```c |
| 33 | +int double_input(int input) { |
| 34 | + return input * 2; |
| 35 | +} |
| 36 | +``` |
| 37 | +
|
| 38 | +To call this from Rust, one would write this program: |
| 39 | +
|
| 40 | +```rust |
| 41 | +extern crate libc; |
| 42 | +
|
| 43 | +extern { |
| 44 | + fn double_input(input: libc::c_int) -> libc::c_int; |
| 45 | +} |
| 46 | +
|
| 47 | +fn main() { |
| 48 | + let input = 4; |
| 49 | + let output = unsafe { double_input(input) }; |
| 50 | + println!("{} * 2 = {}", input, output); |
| 51 | +} |
| 52 | +``` |
| 53 | + |
| 54 | +And that's it! You can try this out for yourself by [checking out the code on |
| 55 | +GitHub][rust2c] and running `cargo run` from that directory. At the source level |
| 56 | +we can see that there's no burden in calling an external function, and we'll see |
| 57 | +soon that the generated code indeed has no overhead. There are, however, a few |
| 58 | +subtle aspects of this Rust program so let's cover each piece in detail. |
| 59 | + |
| 60 | +[rust2c]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/rust-to-c |
| 61 | + |
| 62 | +First up we see `extern crate libc`. [This crate][libc] provides many useful |
| 63 | +type definitions for FFI bindings when talking with C, and it is necessary |
| 64 | +to ensure that both C and Rust agree on the types crossing the language |
| 65 | +boundary. |
| 66 | + |
| 67 | +[libc]: https://crates.io/crates/libc |
| 68 | + |
| 69 | +This leads us nicely into the next part of the program: |
| 70 | + |
| 71 | +```rust |
| 72 | +extern { |
| 73 | + fn double_input(input: libc::c_int) -> libc::c_int; |
| 74 | +} |
| 75 | +``` |
| 76 | + |
| 77 | +In Rust this is a **declaration** of an externally available function. You can |
| 78 | +think of this along the lines of a C header file. Here's where the compiler |
| 79 | +learns about the inputs and outputs of the function, and you can see above that |
| 80 | +this matches our definition in C. Next up we have the main body of the program: |
| 81 | + |
| 82 | +```rust |
| 83 | +fn main() { |
| 84 | + let input = 4; |
| 85 | + let output = unsafe { double_input(input) }; |
| 86 | + println!("{} * 2 = {}", input, output); |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +We see one of the crucial aspects of FFI in Rust here, the `unsafe` block. The |
| 91 | +compiler knows nothing about the implementation of `double_input`, so it must |
| 92 | +assume that memory unsafety *could* happen in this scenario. This may seem |
| 93 | +limiting, but Rust has just the right set of tools to allow consumers to not |
| 94 | +worry about `unsafe` (more on this in a moment). |
| 95 | + |
| 96 | +Now that we've seen how to call a C function from Rust, let's see if we can |
| 97 | +verify this claim of zero overhead. Almost all programming languages can call |
| 98 | +into C one way or another, but it often comes at a cost with runtime type |
| 99 | +conversions or perhaps some language runtime juggling. To get a handle on what |
| 100 | +Rust is doing, let's go straight to the assembly code of the above `main` |
| 101 | +function's call to `double_input`: |
| 102 | + |
| 103 | +``` |
| 104 | +mov $0x4,%edi |
| 105 | +callq 3bc30 <double_input> |
| 106 | +``` |
| 107 | + |
| 108 | +And as before, that's it! Here we can see that calling a C function from Rust |
| 109 | +involves precisely one call instruction after moving the arguments into place, |
| 110 | +exactly the same cost as it would be in C. |
| 111 | + |
| 112 | +### Safe Abstractions |
| 113 | + |
| 114 | +One of Rust's core design principles is its emphasis on ownership, and FFI is no |
| 115 | +exception here. When binding a C library in Rust you not only have the benefit |
| 116 | +of 0 overhead, but you are also able to make it *safer* than C can! Bindings |
| 117 | +can leverage the ownership and borrowing principles in Rust to codify comments |
| 118 | +typically found in a C header about how its API should be used. |
| 119 | + |
| 120 | +For example, consider a C library for parsing a tarball. This library will |
| 121 | +expose functions to read the contents of each file in the tarball, probably |
| 122 | +something along the lines of: |
| 123 | + |
| 124 | +```c |
| 125 | +// Gets the data for a file in the tarball at the given index, returning NULL if |
| 126 | +// it does not exist. The `size` pointer is filled in with the size of the file |
| 127 | +// if successful. |
| 128 | +const char *tarball_file_data(tarball_t *tarball, unsigned index, size_t *size); |
| 129 | +``` |
| 130 | +
|
| 131 | +This function is implicitly making assumptions about how it can be used, |
| 132 | +however, by assuming that the `char*` pointer returned cannot outlive the input |
| 133 | +tarball. When bound in Rust, this API might look like this instead: |
| 134 | +
|
| 135 | +```rust |
| 136 | +pub struct Tarball { raw: *mut tarball_t } |
| 137 | +
|
| 138 | +impl Tarball { |
| 139 | + pub fn file(&self, index: u32) -> Option<&[u8]> { |
| 140 | + unsafe { |
| 141 | + let mut size = 0; |
| 142 | + let data = tarball_file_data(self.raw, index as libc::c_uint, |
| 143 | + &mut size); |
| 144 | + if data.is_null() { |
| 145 | + None |
| 146 | + } else { |
| 147 | + Some(slice::from_raw_parts(data as *const u8, size as usize)) |
| 148 | + } |
| 149 | + } |
| 150 | + } |
| 151 | +} |
| 152 | +``` |
| 153 | + |
| 154 | +Here the `*mut tarball_t` pointer is *owned by* a `Tarball`, so we already have |
| 155 | +rich knowledge about the lifetime of the resource. Additionally, the `file` |
| 156 | +method returns a **borrowed slice** whose lifetime is connected to the same |
| 157 | +lifetime as the source tarball itself. This is Rust's way of indicating that the |
| 158 | +returned data cannot outlive the tarball, statically preventing bugs that may be |
| 159 | +encountered when just using C. |
| 160 | + |
| 161 | +A key aspect of the Rust binding here is that it is a safe function! Although it |
| 162 | +has an `unsafe` implementation (due to calling an FFI function), this interface |
| 163 | +is safe to call and will not cause tough-to-track-down segfaults. And don't |
| 164 | +forget, all of this is coming at 0 cost as the raw types in C are representable |
| 165 | +in Rust with no extra allocations or overhead. |
| 166 | + |
| 167 | +### Talking to Rust |
| 168 | + |
| 169 | +A major feature of Rust is that it does not have a garbage collector or |
| 170 | +runtime, and one of the benefits of this is that Rust can be called from C with |
| 171 | +no setup at all. This means that the zero overhead FFI not only applies when |
| 172 | +Rust calls into C, but also when C calls into Rust! |
| 173 | + |
| 174 | +Let's take the example above, but reverse the roles of each language. As before, |
| 175 | +all the code below is [available on GitHub][c2rust]. First we'll start off with |
| 176 | +our Rust code: |
| 177 | + |
| 178 | +[c2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/c-to-rust |
| 179 | + |
| 180 | +```rust |
| 181 | +#[no_mangle] |
| 182 | +pub extern fn double_input(input: i32) -> i32 { |
| 183 | + input * 2 |
| 184 | +} |
| 185 | +``` |
| 186 | + |
| 187 | +As with the Rust code before, there's not a whole lot here but there are some |
| 188 | +subtle aspects in play. First off we've got our function definition with a |
| 189 | +`#[no_mangle]` attribute. This instructs the compiler to not mangle the symbol |
| 190 | +name for the function `double_input`. Rust employs name mangling similar to C++ |
| 191 | +to ensure that libraries do not clash with one another, and this attributes |
| 192 | +means that you don't have to guess a symbol name like |
| 193 | +`double_input::h485dee7f568bebafeaa` from C. |
| 194 | + |
| 195 | +Next we've got our function definition, and the most interesting part about |
| 196 | +this is the keyword `extern`. This is a specialized form of specifying the [ABI |
| 197 | +for a function][abi-fn] which enables the function to be compatible with a C |
| 198 | +function call. |
| 199 | + |
| 200 | +[abi-fn]: http://doc.rust-lang.org/reference.html#extern-functions |
| 201 | + |
| 202 | +Finally, if you [take a look at the `Cargo.toml`][cargo-toml] you'll see that |
| 203 | +this library is not compiled as a normal Rust library (rlib) but instead as a |
| 204 | +static archive which Rust calls a 'staticlib'. This enables all the relevant |
| 205 | +Rust code to be linked statically into the C program we're about to produce. |
| 206 | + |
| 207 | +[cargo-toml]: https://github.com/alexcrichton/rust-ffi-examples/blob/master/c-to-rust/Cargo.toml#L8 |
| 208 | + |
| 209 | +Now that we've got our Rust library squared away, let's write our C program |
| 210 | +which will call Rust. |
| 211 | + |
| 212 | +```c |
| 213 | +#include <stdint.h> |
| 214 | +#include <stdio.h> |
| 215 | + |
| 216 | +extern int32_t double_input(int32_t input); |
| 217 | + |
| 218 | +int main() { |
| 219 | + int input = 4; |
| 220 | + int output = double_input(input); |
| 221 | + printf("%d * 2 = %d\n", input, output); |
| 222 | + return 0; |
| 223 | +} |
| 224 | +``` |
| 225 | +
|
| 226 | +Here we can see that C, like Rust, needs to declare the `double_input` function |
| 227 | +that Rust defined. Other than that though everything is ready to go! If you run |
| 228 | +`make` from the [directory on GitHub][c2rust] you'll see these examples getting |
| 229 | +compiled and linked together and the final executable should run and print |
| 230 | +`4 * 2 = 8`. |
| 231 | +
|
| 232 | +Rust's lack of a garbage collector and runtime enables this seamless transition |
| 233 | +from C to Rust. The external C code does not need to perform any setup on Rust's |
| 234 | +behalf, making the transition that much cheaper. |
| 235 | +
|
| 236 | +### Beyond C |
| 237 | +
|
| 238 | +Up to now we've seen how FFI in Rust has zero overhead and how we can use Rust's |
| 239 | +concept of ownership to write safe bindings to C libraries. If you're not using |
| 240 | +C, however, you're still in luck! These features of Rust enable it to also be |
| 241 | +called from [Python][py2rust], [Ruby][rb2rust], [Javascript][js2rust], and many |
| 242 | +more languages. |
| 243 | +
|
| 244 | +[py2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/python-to-rust |
| 245 | +[rb2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/ruby-to-rust |
| 246 | +[js2rust]: https://github.com/alexcrichton/rust-ffi-examples/tree/master/node-to-rust |
| 247 | +
|
| 248 | +A common desire for writing C code in these languages is to speed up some |
| 249 | +component of a library or application that's performance critical. With the |
| 250 | +features of Rust we've seen here, however, Rust is just as suitable for this |
| 251 | +sort of usage. One of Rust's first production users, |
| 252 | +[Skylight](https://www.skylight.io), was able to improve the performance and |
| 253 | +memory usage of their data collection agent almost instantly by just using Rust, |
| 254 | +and the Rust code is all published as a Ruby gem. |
| 255 | +
|
| 256 | +Moving from a language like Python and Ruby down to C to optimize performance is |
| 257 | +often quite difficult as it's tough to ensure that the program won't crash in a |
| 258 | +difficult-to-debug way. Rust, however, not only brings zero cost FFI, but *also* |
| 259 | +the same safety guarantees the original source language, enabling this sort of |
| 260 | +optimization to happen even more frequently! |
| 261 | +
|
| 262 | +FFI is just one of many tools in the toolbox of Rust, but it's a key component |
| 263 | +to Rust's adoption as it allows Rust to seamlessly integrate with existing code |
| 264 | +bases today. I'm personally quite excited to see the benefits of Rust reach as |
| 265 | +many projects as possible! |
0 commit comments