Some notes from a tech talk.
There are four kinds of barrier instructions — ld/st, ld/ld, st/ld, st/st. An ld/st, for example, ensures that all reads before the barrier instruction complete before any of the stores after the barrier. This is tricky to get right. A full barrier, which does all four of the above, is the safest kind of barrier and you should use it by default. In fact, some ISAs support only some types of barriers.
In the absence of barriers, the processor is free to re-order as it wants. Some architectures (x86) offer stronger guarantees, and uni-processors generally offer stronger guarantees than multi-processors, but don’t make assumptions.
Barrier instructions are no good if the compiler reorders code. So we need a way to tell the compiler not to do this. This is done by an intrinsic function barrier() that both issues a hardware barrier and prevents the compiler from re-ordering code. Actually, that’s too restrictive, because it’s safe for the compiler to move code in one direction, so there are separate acquire_barrier() and release_barrier() intrinsics, one for each direction.
CPU architectures also provide atomic operations, like compare-and-swap and load-linked/store-conditional. Note that atomic operations don’t necessarily implicitly involve a barrier. Again, some architectures like x86 may implicitly do a barrier on some atomic operations, but it may not be a full barrier.
Any of these atomic ops, along with barriers, can be used to implement spinlocks, which can further be used to implement normal kinds of locks.