Skip to content

Improve ptr::read code in debug builds #81163

Closed
@RalfJung

Description

@RalfJung

In #80290, some people raised concerns about the quality of the code that ptr::write compiles to in debug builds. Given that, to my knowledge, reads are much more common than writes, I would think that one should be much more concerned with the code that ptr::read compiles to -- and currently, there's going to be quite a few function calls in there, so without inlining, that code will be pretty slow.

ptr::read could be improved with techniques similar to what I did for ptr::write (call intrinsics directly, and inline everything else by hand). This would result in (something like) the following implementation: (EDIT see below for why this is wrong)

pub const unsafe fn read<T>(src: *const T) -> T {
    // We are calling the intrinsics directly to avoid function calls in the generated code
    // as `intrinsics::copy_nonoverlapping` is a wrapper function.
    extern "rust-intrinsic" {
        fn copy_nonoverlapping<T>(src: *const T, dst: *mut T, count: usize);
    }

    // For the same reason, we also side-step `mem::MaybeUninit` and use a custom `union` directly.
    #[repr(C)]
    union MaybeUninit<T> {
        init: mem::ManuallyDrop<T>,
        uninit: (),
    }

    let mut tmp: MaybeUninit<T> = MaybeUninit { uninit: () };
    // SAFETY: the caller must guarantee that `src` is valid for reads.
    // `src` cannot overlap `tmp` because `tmp` was just allocated on
    // the stack as a separate allocated object.
    //
    // `MaybeUninit` is repr(C), so we can assume `init` is at offset 0, justifying the pointer
    // casts.
    //
    // Finally, since we just wrote a valid value into `tmp`, it is guaranteed
    // to be properly initialized.
    unsafe {
        copy_nonoverlapping(src, &mut tmp as *mut _ as *mut T, 1);
        mem::transmute_copy(&tmp.init)
    }
}

However, here we have the extra difficulty that read is (unstably) a const fn, so the above implementation is rejected. &tmp.init can be replaced by &mut tmp.init and that works (or we wait for a bootstrap bump so we can make use of #80418), but transmute_copy is non-const, so there's still more work to be done. (transmute does not work since the compiler does not recognize that T and ManuallyDrop<T> have the same size.)

I will stop here, but if someone else strongly cares about ptr::read performance/codesize in debug builds, feel free to pick this up and drive it to completion.

Cc @bjorn3 @therealprof @usbalbin

Metadata

Metadata

Assignees

Labels

A-mir-optArea: MIR optimizationsA-mir-opt-inliningArea: MIR inliningA-raw-pointersArea: raw pointers, MaybeUninit, NonNullI-slowIssue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions