Skip to content

Can allocations have gaps? Can they be partially deallocated? Can they grow "in-place"? #430

Open
@RalfJung

Description

@RalfJung

Rust itself can only create contiguous allocations that can only be deallocated all at once. But does the compiler get to assume that all allocations, including those generated by 3rd party allocators, behave like that?

This does have impact on optimizations:

  • for a ptr: *const u8, if we see ptr.read() and ptr.wrapping_offset(x).read(), can we conclude that all memory at offset 0..x is dereferenceable?
  • if we have a x: &Cell<T>, then call unknown_fn() (which might deallocate the memory x points to), and then at least one byte of x is read again -- can we conclude that all of x (0..size_of::<T>()) is still dereferenceable?

One major example of an API that might violate this is mmap. There is a fairly clean way to model mmap: we say there is one (exposed) provenance (one allocation) for all mmap'ed memory, and with mmap/munmap ranges of memory become dereferenceable with that provenance. But clearly the "allocation" that represents all mmaped memory can have gaps, and it can be partially deallocated! If we say that the compiler can assume that allocations never have gaps, then the following is UB:

  • assume that no memory is mapped at X .. (X+4*PAGE_SIZE)
  • map a page at address X
  • map another page at address X+2*PAGE_SIZE
  • call the following function with X as *const u8:
fn break(ptr: *const u8) {
  ptr.read();
  ptr.wrapping_add(2*PAGE_SIZE).read();
}

This code must be UB because under the "no gap" assumption, the compiler is allowed to insert another read at ptr.add(PAGE_SIZE), but that address is not mapped so we'd get a pagefault.

So this leads to some questions:

  • Is the sequence of actions described above permitted? IMO the answer must be yes; making this UB is just too much of a footgun. Where does the UB even come from? wrapping_add is safe, so somehow the 2nd read is UB? But that memory was mmap'ed!
  • If that sequence of actions is permitted -- I think that implies we do not have a "no gap" assumption. At least we certainly don't get the optimizations we'd usually get from that assumption, as the example above shows.
  • What's LLVM's stance on this? Does LLVM do anything that assumes "no gaps"? Should we propose to add something to the LangRef that explicitly says gaps can happen?

Note that "mmap works in addresses, there is no provenance" does not help: in break, ptr has some provenance, and no matter where it comes from -- if we say "no gaps", no one provenance can be valid for both of these reads with an unmapped page in between them.

For the "no partial deallocation" assumption, the situation is similar -- we can also write mmap-based code that violates it:

  • map a page at address X, and another one at X+PAGE_SIZE
  • call the following function with X as *const u8:
fn break2(ptr: *const u8) {
  let _val = ptr.cast::<[u8; 2*PAGE_SIZE]>().read();
  deallocate_second_page(); // unmap the page at X+PAGE_SIZE
  let _val = ptr.read();
}

Is the compiler allowed to insert a speculative ptr.cast::<[u8; 2*PAGE_SIZE]>().read() at the end? If yes, this would segfault, so somehow the original code must have UB. I don't think it can reasonably have UB, and therefore we have to accept that partial deallocation is a thing.

For allocations growing in-place, if the compiler sees the malloc it might exploit that the geometry of the allocation cannot change, so even after calling unknown code an access outside the of the allocation per its initial size could be optimized away as UB -- but if the unknown code was actually growing the allocation to make the access in-bounds, that would be a wrong optimization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions