Example POWER 5/5+ Implementation for C/C++ Memory Model
ISO/IEC JTC1 SC22 WG21 N2745 = 08-0255 - 2008-08-22 (REVISED for POWER 5/5+)
Paul E. McKenney, email@example.com
This document presents an implementation of the proposed C/C++
memory-order model for the POWER 5/5+ family of computer systems,
which require either usage restrictions or special code sequences
to implement the proposed C/C++ sequentially consistent atomic operations.
The POWER 5/5+ family of computer systems successfully run
parallel programs containing atomic operations as long as
at least one of the following conditions is met:
- Traditional synchronization primitives such as locking or
read-copy update (RCU) are used instead of the proposed
C/C++ sequentially consistent atomic operations.
Note that the proposed C/C++ acquire/release atomic operations
may use the standard PowerPC code sequences, as shown in the
- Simultaneous multi-threading is disabled so that only one
hardware thread is active per core (as is often done for
computationally intensive numerical workloads).
- Operating-system thread-affinity facilities are used so that
any given multithreaded application has at most one thread active
on any given core.
- Each multi-threaded application is confined to a single core
(which may have both hardware threads enabled).
- The code sequences from the following table are used to implement
the C/C++ sequentially consistent atomic operations.
Please note that other members of the Power family, for example,
Power 6 and Power 7, need not adhere to any of the above
|Operation ||POWER 5/5+ Implementation|
|Load Relaxed |
|Load Consume |
|Load Acquire |
ld; cmp; bc; isync
|Store Relaxed |
|Store Release |
|Store Seq Cst |
|Cmpxchg Relaxed,Relaxed (32 bit)
_loop: lwarx; cmp; bc _exit; stwcx.; bc _loop; _exit:
|Cmpxchg Acquire,Relaxed (32 bit)
_loop: lwarx; cmp; bc _exit; stwcx.; bc _loop; isync; _exit:
|Cmpxchg Release,Relaxed (32 bit)
lwsync; _loop: lwarx; cmp; bc _exit; stwcx.; bc _loop; _exit:
|Cmpxchg AcqRel,Relaxed (32 bit)
lwsync; _loop: lwarx; cmp; bc _exit; stwcx.; bc _loop; isync; _exit
|Cmpxchg SeqCst,Relaxed (32 bit)
hwsync; _loop: lwarx; cmp; bc _exit; stwcx.; bc _loop; isync; _exit
|Acquire Fence |
|Release Fence |
|AcqRel Fence |
junk may be any memory location.
It is permissible to use
junk as the loop control variable, as
long as that loop control variable is assigned to a memory location.
It is legitimate (but usually unnecessary) to replace
eieio instructions with the
code sequence shown above for “SeqCst Fence (POWER5/5+)”.