Skip to content

Promote the header byte jump table to a constant #4

Closed
@zslayton

Description

@zslayton

The create_header_byte_jump_table function in binary/header.rs will parse each possible byte value from 0 to 255 as though it were the one-octet type descriptor found at the beginning of all binary Ion values. The output from parsing each byte is stored in a newly allocated Vec that can be used as a jump table to avoid re-calculating the meaning of same byte values repeatedly.

This is sub-optimal:

  • It requires a Vec to be allocated when a fixed-size array would do
  • It requires entities that wish to use the jump table to maintain their own copy of the Vec despite the fact that its contents will always be identical.

Ideally, we would store a fixed-length array as a static constant in the header module. Unlike C, however, the Rust compiler tries very hard to prevent programs from mutating global state. This means that initializing a static array is currently somewhat difficult to do without sacrificing either speed or safety.

Here are some options I've explored:

safe + slow

The lazy_static crate provides a macro that allows you to declare and initialize a global constant. It works by wrapping your constant in a custom smartpointer type--the first time the pointer is dereferenced, lazy_static will call a user-defined function to calculate the value of the constant. Every dereference that occurs from then on will receive the already-calculated value.

This is easy to implement, but profiling with valgrind has shown that the indirection introduced by the smartpointer indirection adds an unacceptable amount of overhead in terms of CPU time. Derefencing the jump table each time you begin parsing a value in a stream with millions of values adds up quickly.

thread_local copies of the jump table suffer from even worse indirection overhead, rendering them a non-starter.

If/when we add an IonSystem concept, we may be able to have IonReaders share a reference to a shared jump table living in the IonSystem, but this may not be worth the complexity.

unsafe + fast

Initializing a fixed-size array (i.e. [T], not a Vec) is currently surprisingly challenging for non-trivial types. The best methods available only work for types that implement the Copy trait, and types that implement Default and where the array is <32 entries long.

We could choose to skirt this problem with unsafe code, but this has its own problems:

  • The compiler cannot guarantee the memory safety of unsafe code, so we would need to scrutinize it very carefully.
  • Because the array would be initialized at runtime (not build time), we would need to design an appropriate synchronization mechanism to guarantee that readers didn't reference the jump table before it was initialized.
  • The APIs for working with not-yet-initialized data are in a state of flux. The Rust ecosystem is migrating away from mem::uninitialized (currently deprecated in nightly) and towards mem::MaybeUninit (currently unstable).

safe + fast + not possible yet

Rust is incrementally adding support for const fns, functions whose outputs can be calculated entirely at compile time. This is perfect for our use case -- all of the inputs are statically known (u8 values from 0 to 255) and processing them requires no information from the outside world. With this feature, we'll be able to simply write:

const HEADER_JUMP_TABLE: [IonResult<Option<IonValueHeader>>] = create_jump_table();

The compiler will calculate the definition of HEADER_JUMP_TABLE at build time and then all interested entities can refer to it freely -- no wasted memory, no additional overhead from indirection.

At the time of this writing, const fns have severe restrictions on which language features may be used. Functionality like match statements, the ? operator, calling non-const functions, looping, etc. are planned but not yet supported. These restrictions are so limiting that I have not been able to write an array initialization const fn.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions