Description
The create_header_byte_jump_table
function in binary/header.rs
will parse each possible byte value from 0 to 255 as though it were the one-octet type descriptor found at the beginning of all binary Ion values. The output from parsing each byte is stored in a newly allocated Vec
that can be used as a jump table to avoid re-calculating the meaning of same byte values repeatedly.
This is sub-optimal:
- It requires a
Vec
to be allocated when a fixed-size array would do - It requires entities that wish to use the jump table to maintain their own copy of the
Vec
despite the fact that its contents will always be identical.
Ideally, we would store a fixed-length array as a static constant in the header
module. Unlike C, however, the Rust compiler tries very hard to prevent programs from mutating global state. This means that initializing a static array is currently somewhat difficult to do without sacrificing either speed or safety.
Here are some options I've explored:
safe + slow
The lazy_static
crate provides a macro that allows you to declare and initialize a global constant. It works by wrapping your constant in a custom smartpointer type--the first time the pointer is dereferenced, lazy_static
will call a user-defined function to calculate the value of the constant. Every dereference that occurs from then on will receive the already-calculated value.
This is easy to implement, but profiling with valgrind
has shown that the indirection introduced by the smartpointer indirection adds an unacceptable amount of overhead in terms of CPU time. Derefencing the jump table each time you begin parsing a value in a stream with millions of values adds up quickly.
thread_local
copies of the jump table suffer from even worse indirection overhead, rendering them a non-starter.
If/when we add an IonSystem
concept, we may be able to have IonReader
s share a reference to a shared jump table living in the IonSystem
, but this may not be worth the complexity.
unsafe + fast
Initializing a fixed-size array (i.e. [T]
, not a Vec) is currently surprisingly challenging for non-trivial types. The best methods available only work for types that implement the Copy
trait, and types that implement Default
and where the array is <32 entries long.
We could choose to skirt this problem with unsafe
code, but this has its own problems:
- The compiler cannot guarantee the memory safety of
unsafe
code, so we would need to scrutinize it very carefully. - Because the array would be initialized at runtime (not build time), we would need to design an appropriate synchronization mechanism to guarantee that readers didn't reference the jump table before it was initialized.
- The APIs for working with not-yet-initialized data are in a state of flux. The Rust ecosystem is migrating away from
mem::uninitialized
(currently deprecated in nightly) and towardsmem::MaybeUninit
(currently unstable).
safe + fast + not possible yet
Rust is incrementally adding support for const fn
s, functions whose outputs can be calculated entirely at compile time. This is perfect for our use case -- all of the inputs are statically known (u8
values from 0 to 255) and processing them requires no information from the outside world. With this feature, we'll be able to simply write:
const HEADER_JUMP_TABLE: [IonResult<Option<IonValueHeader>>] = create_jump_table();
The compiler will calculate the definition of HEADER_JUMP_TABLE
at build time and then all interested entities can refer to it freely -- no wasted memory, no additional overhead from indirection.
At the time of this writing, const fn
s have severe restrictions on which language features may be used. Functionality like match
statements, the ?
operator, calling non-const
functions, looping, etc. are planned but not yet supported. These restrictions are so limiting that I have not been able to write an array initialization const fn
.