Skip to content

Add atomic loads and stores and barriers #9247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 21, 2019

Conversation

kjbracey
Copy link
Contributor

@kjbracey kjbracey commented Jan 3, 2019

Description

Add atomic load and store functions, and add barriers to the existing atomic functions.

File currently has no explicit barriers - we don't support SMP, so don't need CPU barriers.

But we do need to worry about compiler barriers - particularly if link time optimisation is activated so that the compiler can see inside these functions. The assembler or intrinsics that access PRIMASK for
enter/exit critical act as barriers, but LDREX, STREX and simple volatile pointer loads and stores do not.

Pull request type

[ ] Fix
[ ] Refactor
[ ] Target update
[X] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@c1728p9, @pan-

@kjbracey
Copy link
Contributor Author

kjbracey commented Jan 3, 2019

I'm in two minds about this one - if people are directly accessing atomics with raw load or store, they're probably doing something weird and should be using something higher-level, but maybe there's no good alternative. This is being added to cover the _pending flag in InternetSocket - see #9248.

Copy link
Member

@pan- pan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm puzzled with the atomic load/store implementation proposed - I must be missing something 😕 .
From what I understand MBED_BARRIER is a compiler fence; nothing more. I believe we should use __LDREX? and __STREX? on M3 and M4 architecture and we can use a critical section on M0 architecture.

* Stop the compiler moving memory accesses.
*/
#ifdef __CC_ARM
#define MBED_BARRIER() __memory_changed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a detailed description - and maybe more appropriate naming - of what MBED_BARRIER is; where it should be used, what it does and what it doesn't do.

From my understanding asm volatile("" : : : "memory") forces the compiler to not reorder the code it produces however there is no guarantee that the code won't be reordered by the CPU or that memory writes have completed. It doesn't replaces CPU barrier instructions.

MBED_FORCEINLINE void core_util_atomic_flag_clear(volatile core_util_atomic_flag *flagPtr)
{
MBED_BARRIER();
flagPtr->_flag = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't there a memory barrier after _flag is set to false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, one is needed there. I was thinking it wasn't needed for single-CPU, but sequential consistency requires it for the same reason you would need two DMBs (https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html). Assuming we want sequential consistency (the C++11 default), then we need to guarantee that

atomic_store(x); // release
atomic_load(y); // acquire

can't be reordered. So that has to end up as:

BARRIER // for release semantics - keep preceding above
x = foo;
BARRIER // stop store+load being reordered
bar = y;
BARRIER // acquire semantics - keep following below

And it's store that gets the extra sequential-consistency barrier (like the extra DMB), because stores are probably less common.

*/
MBED_FORCEINLINE uint8_t core_util_atomic_load_u8(const volatile uint8_t *valuePtr)
{
uint8_t value = *valuePtr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't there a memory barrier before *valuePtr is read?

Copy link
Contributor Author

@kjbracey kjbracey Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because load has "acquire" semantics, as per C++11/C11. It only needs to stop stuff from afterwards moving before it.

@kjbracey
Copy link
Contributor Author

kjbracey commented Jan 4, 2019

I'm puzzled with the atomic load/store implementation proposed - I must be missing something 😕 .

mbed_critical currently has no SMP support, and has no CPU barriers at all. These new functions are consistent with the rest of the file - compiler barriers only, assuming 1 CPU, and protect against other threads or interrupts on that single CPU.

I believe we should use __LDREX? and __STREX? on M3 and M4 architecture

Those don't have any barrier/ordering functionality in the CPU. The LDREX and STREX macros/intrinsics only provide compiler barriers - if you were doing SMP you would either need to add DMB instructions for CPU barriers (ARMv7) or use LDAEX and STLEX to get CPU acquire/release semantics (ARMv8).

So these implementations are to add the necessary equivalent compiler barrier on a plain load or store, that we get automatically whenever we happen to use LDREX/STREX or touch PRIMASK.

(Edit, oops, no, LDREX and STREX are not actually compiler barriers, necessarily. I'm going to need to add some.)

Without the barrier, this can go wrong:

if (!atomic_flag_test_and_set(&flag)) {
     do some stuff_to_state_protected_by_flag();
     atomic_flag_clear(&flag);
}

The "do some stuff" could be reordered after the flag clear by the compiler, so that we are modifying the state with the flag clear. Existing attempts to do the equivalent with a "volatile bool" suffer the same problem - they should change to use atomic_flag to get the barrier that's being added here.

With LTO off, this was avoided because the "call outside this C file" to atomic_flag_clear is an effective compiler barrier in itself. But the moment LTO is on, and it can see inside the definition (or once it's made inline, like it is in this patch), that reordering will happen.

An alternative would be to implement the atomic loads and stores like asm volatile("STR %0,[%1]" : : : "memory"), so that they are barriers like the STREX or PRIMASK, but that would significantly impede code generation.

@kjbracey kjbracey changed the title Add atomic loads and stores with barriers Add atomic loads and stores and barriers Jan 4, 2019
Copy link
Member

@pan- pan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to answer my concerns @kjbracey-arm ; it all makes sense now.

@kjbracey
Copy link
Contributor Author

kjbracey commented Jan 4, 2019

Reworked the comments more - provide both MBED_COMPILER_BARRIER and MBED_BARRIER (which is currently MBED_COMPILER_BARRIER because non-SMP). I can see that ARMv8 atomics will want "compiler barrier" specifically, while ARMv7 atomics want "DMB or compiler barrier", so provide both.

@kjbracey
Copy link
Contributor Author

Don't think this needs any more work - maybe more review.

@0xc0170 0xc0170 requested a review from a team January 14, 2019 09:36
@0xc0170
Copy link
Contributor

0xc0170 commented Jan 16, 2019

@SenRamakri @deepikabhavnani If you could review this one, it's nearly ready for CI

Copy link

@deepikabhavnani deepikabhavnani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@cmonr
Copy link
Contributor

cmonr commented Jan 17, 2019

CI started

@mbed-ci
Copy link

mbed-ci commented Jan 17, 2019

Test run: FAILED

Summary: 1 of 1 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_unittests

Add atomic load and store functions, and add barriers to the existing atomic
functions.

File currently has no explicit barriers - we don't support SMP, so don't
need CPU barriers.

But we do need to worry about compiler barriers - particularly if link time
optimisation is activated so that the compiler can see inside these
functions. The assembler or intrinsics that access PRIMASK for
enter/exit critical act as barriers, but LDREX, STREX and simple
volatile pointer loads and stores do not.
@0xc0170
Copy link
Contributor

0xc0170 commented Jan 18, 2019

CI restarted

@cmonr cmonr removed the needs: work label Jan 18, 2019
@mbed-ci
Copy link

mbed-ci commented Jan 18, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 2
Build artifacts

@0xc0170 0xc0170 merged commit a23a850 into ARMmbed:master Jan 21, 2019
@kjbracey kjbracey deleted the atomic_load_store branch January 24, 2019 09:39
kjbracey added a commit to kjbracey/mbed-os that referenced this pull request Jan 29, 2019
As barriers were added in ARMmbed#9247, review comments pointed out that atomic
stores needed a second barrier, and they got one. But atomic_flag_clear
was missed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants