Closed
Description
The work stealing algorithm uses a deque. Their algorithm has some properties that might make further optimizations possible later (one end is used only by a single thread). We only need something simple to start with, a locked vector that just pushes and pops and shifts and unshifts. Using a circular buffer would be better, lock-free better still (maybe).
Also probably relevant is the paper on data locality in work stealing though I haven't read it yet.
There are some useful data structures for atomically reference counted types and mutexes in core::private
.