You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug #30837136 RW_LOCK_X_LOCK_LOW: CONDITIONAL JUMP OR MOVE
Valgrind identified a "problem" (present since 2013) that we sometimes
read "invalid" value of writer_thread.
A thread which releases the latch does:
1. set lock->recursive to false
2. mark lock->writer_thread as invalid
A thread which tries to check if it already owns the latch does:
0. check lock->recursive is true
3. check lock->writer_thread is myself
Note that if they are executed in order 0,1,2,3 even with full memory
barriers etc. you will end up in a situation where the "invalid" value of
writer_thread is being inspected.
This is not really a bug, because if the value is "invalid" it still can not
be equal to our thread's id, unless a torn read happened (quite unlikely on
modern architectures) and the mixture of old and new bytes looks like our id.
Another (unlikely?) possibility would be that the `0x0` value we used for
initialization of `writer_thread` field is somehow our thread's id (which
anyway shouldn't matter because of initially setting `recursive` to `false`).
To remove any doubt here, instead of `0`, we will use
`std::thread().native_handle()` which is the official "invalid" value for a
thread handle - no real thread can have such a native handle.
To avoid "torn" values, this patch declares `writer_thread` as `std::atomic`,
but uses `std::memory_order_relaxed` for accesses to it, so that it does not
introduce any new ordering constraints - just makes sure operations are
atomic (so, no "torn" read is possible).
Rather than adding another supression for Valgrind, this patch simply removes
the "2." step.
What *is* important (and was recently fixed) is to make sure that IF we
loaded `recursive==true`, THEN we will see `writer_thread` value set by the
thread which set `recursive` to `true`, or some LATER value. In particular this
is important, if our thread was recently owning the latch, and our thread id
was left in `writer_thread` - we need to make sure that it gets overwritten by
next thread before we look at it.
In other words, we need a guarantee that
```
recursive.load(acquire) && writer_thread==me
```
happens if and only if, my thread is indeed the one who already owns the latch.
And for that it suffices to follow the simple protocol:
When acquiring the latch for yourself:
```
writer_thread = me
recursive.store(true, release)
```
When acquiring the latch for someone else (pass=true):
```
writer_thread = me
recursive.store(false, release)
```
(this recursive=false will prevent me from wrongly believe I can latch it
recursively, while my intention was to pass the ownership to another thread).
When recieving ownership from someone else, do the same thing as when acquiring
the latch.
Before releasing the latch do:
```
recursive.store(false, relaxed)
```
When checking if you already own the latch do:
```
if(recursive.load(acquire) && writer_thread.load(relaxed)==me)
```
The guarantee holds, because:
(=>) If we indeed hold the latch we set writer_thread = me and recursive to true
and nobody else have modified it since, so the `if`'s condition is true.
(<=) If the `if` condition was evaluated to `true` then it means that our
`recursive.load` has read `true`, so it synchronizes with a thread which
performed `recursive.store(true, release)`, so the value of `writer_thread`
we read, is (A) either the one it has stored, or (B) some even fresher one.
(A) As we saw `writer_thread == me` and we are the only thread which could
write this particular value to this field, it must mean that we are the
thread which performed `recursive.store(true, release)` most recently,
so we own the latch.
(B) As we are now in a source code line which does `if` (as opposed to a
source code line between write to `writer_thread` and
`recursive.store`) we had to perform some `recursive.store` (`true`, or
`false`) before getting here. So our `recursive.load` has to happen
after our own `store` to it. But this contradicts the premise of (B),
as we get a cycle: our write to `writer_thread` is before our `store`,
which is before or the same `store` we synchronize with, which is, by
premise of (B), before our write to writer_thread.
So, case B is impossible.
RB:23954
Reviewed by : Kevin Lewis <kevin.lewis@oracle.com>
Reviewed by : Nikša Skeledžija <niksa.skeledzija@oracle.com>
0 commit comments