Skip to content

About "unleaking": what is the required pointer provenance in dealloc? #316

Open
@danielhenrymantilla

Description

@danielhenrymantilla

Sorry if this is a duplicate, but I like the "keywords" I've showcased in this issue. Other related issues:

Note that the quite loaded term "provenance" is being used here as described mainly in #134.


Unleaking

The stdlib libs docs currently state, regarding Box::leak:

Dropping the returned reference [return value of Box::leak()] will cause a memory leak. If this is not acceptable, the reference should first be wrapped with the Box::from_raw function producing a Box. This Box can then be dropped which will properly destroy T and release the allocated memory.

So, even if there is no code snippet, such statement is stating that:

fn drop_box<T> (boxed: Box<T>)
{
    drop(unsafe { Box::from_raw(Box::leak(boxed)) });
}

is sound, no matter the alloc::Global backing it.

  • A far-fetched / contrived generalization to any impl Allocator
    fn drop_box_in<T, A : Allocator> (boxed: Box<T, A>)
    {
        let (mut ptr, alloc) = Box::into_raw_with_allocator(x);
        unsafe {
            ptr = &mut *ptr;
            drop(Box::from_raw_in(ptr, alloc));
        }
    }

Which, given Box's implementation, is assuming that if somebody asks an impl GlobalAlloc —or an impl Alloc if generalizing— memory for a Layout::new::<T>() (through alloc or realloc), and gets back a non-null pointer ptr, then it is then legal to give back ptr to that impl Alloc's dealloc (or realloc), but with ptr's provenance having been "shrunk" down to that T's layout (e.g., through ptr = <*mut _>::cast(&mut *ptr.cast::<MaybeUninit<T>>());).

This, in practice, can be quite problematic for many (most?) GlobalAlloc implementations out there, since they do often perform some bookkeeping and whatnot laid out contiguously to the yielded allocation, and such metadata would thus not be accessible from such a returned pointer alone: the allocator would thus need to keep some extra data / state to be able to get back provenance over the user-facing allocation and the contiguous metadata.

A simplified example

use ::std::{
    alloc::{self, GlobalAlloc as _, Layout},
    mem::drop as stuff,
    ptr,
};

unsafe trait AllocI32 {
    fn alloc_i32() -> Option<ptr::NonNull<i32>>;

    unsafe
    fn dealloc(_: ptr::NonNull<i32>);
}

struct MyAllocUsingGlobalAllocUnderTheHood;

#[repr(C)]
struct I32AndMeta {
    alloc: i32,
    meta: u8,
}

static SYSTEM_ALLOC: alloc::System = alloc::System;

unsafe impl AllocI32 for MyAllocUsingGlobalAllocUnderTheHood {
    fn alloc_i32() -> Option<ptr::NonNull<i32>> {
        let layout = Layout::new::<I32AndMeta>();
        ptr::NonNull::new(unsafe { SYSTEM_ALLOC.alloc_zeroed(layout) })
            .map(ptr::NonNull::cast)
    }

    unsafe fn dealloc(ptr: ptr::NonNull<i32>) {
        let layout = Layout::new::<I32AndMeta>();
        let meta = ptr.as_ptr().cast::<u8>().add(4).read();
        stuff(meta);
        SYSTEM_ALLOC.dealloc(ptr.as_ptr().cast(), layout);
    }
}

The interesting lines here are:

    unsafe fn dealloc(ptr: ptr::NonNull<i32>) {
        let layout = Layout::new::<I32AndMeta>();
        let meta = ptr.as_ptr().cast::<u8>().add(4).read();

if ptr were to stem from &i32 (e.g., let r: &i32 = …; dealloc(r.into());), even if that &i32 had originated from an alloc()-yielded ptr (let r: &i32 = alloc().unwrap().as_ref();), then the operation reading meta would not be well-defined: r.add(4) would yield an off-by-one pointer, which would not be usable to perform a read-dereference with.


The two possible workarounds

In a world without any abstraction whatsoever, the answer to this problem is easy: keep a pointer with provenance over that allocated I32AndMeta around (such as the ptr returned by alloc itself), and use it to "launder" the received ptr. But since there is this Alloc / GlobalAlloc boundary, the question remains: who should be responsible for doing this?

  • Would it be the Allocator, as in:

        unsafe fn dealloc(ptr: ptr::NonNull<i32>) {
            let layout = Layout::new::<I32AndMeta>();
            let ptr = ptr.as_ptr().cast::<u8>();
    +       let ptr_with_provenance: *mut u8 = Self::get_ptr_with_provenance(…);
    +       // amend `ptr` so that it points to the same place, but with a fixed provenance
    +       let ptr = <*mut u8>::wrapping_offset(
    +           ptr_with_provenance,
    +           isize::wrapping_sub(
    +               ptr.as_ptr() as _,
    +               ptr_with_provenance as _,
    +           ),
    +       );
            let meta = ptr.add(4).read(); // <- this read is now well-defined!
            stuff(meta);
            SYSTEM_ALLOC.dealloc(ptr.as_ptr().cast(), layout);
        }
  • or would it be the user of the Allocator, by declaring "unleaking" to be a contract violation / by requiring that a pointer with the originally-obtained provenance be the only valid input for a {de,re}alloc call?

    pub struct MutRefWithOriginalProvenance<'lt, T> {
        ptr: ptr::NonNull<T>,
        _phantom: PhantomData<&'lt mut T>,
    }
    
    impl<T> Deref{,Mut} for{ type Target = T;}
    implMutRefWithOriginalProvenance{
        pub fn into_raw…
    }
    let mut r: MutRefWithOriginalProvenance<'static, i32> = Box::reversible_leak(Box::new(42));
    *r += 27;
    unsafe {
        drop(Box::from_raw(r.into_raw()));
    }

This point ought to be clarified, and if going for the latter —or until confirming the former—, then the stdlib docs should be updated to actually disincentivize unleaking.


My potentially-obvious two cents

It feels like the "legalized unleaking" approach has the drawback of requiring that extra get_ptr_with_provenance(…) operation, which could come with a non-negligible cost for all GlobalAlloc implementations, only to allow a potentially deemed niche "unleaking" operation.

But it also feels like "forbidden unleaking" approach is quite a footgun, if, for instance, even the stdlib docs have gotten it wrong for such a long time.

So this seems like the classic "let's gauge/measure the performance benefits of 'forbidden unleaking' / the performance cost of 'legalized unleaking'" to compare them against the footguns that forbidding it introduces.

Finally, and this is technically beyond Rust's reach, there is also the question of non-Rust pervasive implementations of GlobalAlloc, such as that of libc (malloc, calloc, realloc, free) and whatnot. Such implementations do use metadata, and according to @chorman0773, the cost of a get_ptr_with_provenance(…) operation would be very much non-negligible (and, technically, even more so since Rust cannot go and tweak such an implementation, and would thus have to wrap it in a black-box API kind of fashion).

  • So, from the looks of it / IIUC, the only practical approach w.r.t. a legalized unleaking would be to ban malloc & friends from being used for GlobalAlloc implementations! But I may very well be wrong; I'll let @chorman0773 (and others) chime in and clarify this hypothetical point (although if this were to be true, then I guess there is really no other choice than forbidding unleaking).

    • This, in turn, could be deemed inconvenient for FFI users which would have chosen a malloc-powered #[global_allocator] to allow free-ing to occur from the C side, but this is yet another topic…

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions