-
Notifications
You must be signed in to change notification settings - Fork 3k
Add atomic loads and stores and barriers #9247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm in two minds about this one - if people are directly accessing atomics with raw load or store, they're probably doing something weird and should be using something higher-level, but maybe there's no good alternative. This is being added to cover the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm puzzled with the atomic load/store implementation proposed - I must be missing something 😕 .
From what I understand MBED_BARRIER
is a compiler fence; nothing more. I believe we should use __LDREX?
and __STREX?
on M3 and M4 architecture and we can use a critical section on M0 architecture.
platform/mbed_toolchain.h
Outdated
* Stop the compiler moving memory accesses. | ||
*/ | ||
#ifdef __CC_ARM | ||
#define MBED_BARRIER() __memory_changed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a detailed description - and maybe more appropriate naming - of what MBED_BARRIER
is; where it should be used, what it does and what it doesn't do.
From my understanding asm volatile("" : : : "memory")
forces the compiler to not reorder the code it produces however there is no guarantee that the code won't be reordered by the CPU or that memory writes have completed. It doesn't replaces CPU barrier instructions.
MBED_FORCEINLINE void core_util_atomic_flag_clear(volatile core_util_atomic_flag *flagPtr) | ||
{ | ||
MBED_BARRIER(); | ||
flagPtr->_flag = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't there a memory barrier after _flag
is set to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, one is needed there. I was thinking it wasn't needed for single-CPU, but sequential consistency requires it for the same reason you would need two DMBs (https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html). Assuming we want sequential consistency (the C++11 default), then we need to guarantee that
atomic_store(x); // release
atomic_load(y); // acquire
can't be reordered. So that has to end up as:
BARRIER // for release semantics - keep preceding above
x = foo;
BARRIER // stop store+load being reordered
bar = y;
BARRIER // acquire semantics - keep following below
And it's store that gets the extra sequential-consistency barrier (like the extra DMB), because stores are probably less common.
*/ | ||
MBED_FORCEINLINE uint8_t core_util_atomic_load_u8(const volatile uint8_t *valuePtr) | ||
{ | ||
uint8_t value = *valuePtr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't there a memory barrier before *valuePtr
is read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because load has "acquire" semantics, as per C++11/C11. It only needs to stop stuff from afterwards moving before it.
mbed_critical currently has no SMP support, and has no CPU barriers at all. These new functions are consistent with the rest of the file - compiler barriers only, assuming 1 CPU, and protect against other threads or interrupts on that single CPU.
Those don't have any barrier/ordering functionality in the CPU. So these implementations are to add the necessary equivalent compiler barrier on a plain load or store, that we get automatically whenever we happen to use (Edit, oops, no, LDREX and STREX are not actually compiler barriers, necessarily. I'm going to need to add some.) Without the barrier, this can go wrong:
The "do some stuff" could be reordered after the flag clear by the compiler, so that we are modifying the state with the flag clear. Existing attempts to do the equivalent with a "volatile bool" suffer the same problem - they should change to use atomic_flag to get the barrier that's being added here. With LTO off, this was avoided because the "call outside this C file" to An alternative would be to implement the atomic loads and stores like |
2990e9e
to
834c508
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to answer my concerns @kjbracey-arm ; it all makes sense now.
834c508
to
92a4afb
Compare
Reworked the comments more - provide both |
de9d6a4
to
196e1ed
Compare
Don't think this needs any more work - maybe more review. |
@SenRamakri @deepikabhavnani If you could review this one, it's nearly ready for CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
CI started |
Test run: FAILEDSummary: 1 of 1 test jobs failed Failed test jobs:
|
Add atomic load and store functions, and add barriers to the existing atomic functions. File currently has no explicit barriers - we don't support SMP, so don't need CPU barriers. But we do need to worry about compiler barriers - particularly if link time optimisation is activated so that the compiler can see inside these functions. The assembler or intrinsics that access PRIMASK for enter/exit critical act as barriers, but LDREX, STREX and simple volatile pointer loads and stores do not.
196e1ed
to
703e440
Compare
CI restarted |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
As barriers were added in ARMmbed#9247, review comments pointed out that atomic stores needed a second barrier, and they got one. But atomic_flag_clear was missed.
Description
Add atomic load and store functions, and add barriers to the existing atomic functions.
File currently has no explicit barriers - we don't support SMP, so don't need CPU barriers.
But we do need to worry about compiler barriers - particularly if link time optimisation is activated so that the compiler can see inside these functions. The assembler or intrinsics that access PRIMASK for
enter/exit critical act as barriers, but LDREX, STREX and simple volatile pointer loads and stores do not.
Pull request type
Reviewers
@c1728p9, @pan-