ARM Cortex™-M Programming Guide to Memory Barrier Instructions
ARM DAI 0321A Copyright © 2012 ARM Limited. All rights reserved. 14
ID091712 Non-Confidential
3.4 Architectural and implementation differences
Although the architecture permits memory re-ordering to happen in many cases, in practice, the
majority of simple processors do not re-order memory transfers. As a result, there is a difference
in the requirements of the architecture and the processor’s implementation requirements. For
example, most application programs can run correctly on existing Cortex-M processors without
using any memory barrier instructions, although the architecture requires the use of memory
barriers in some situations. Also, in most scenarios the applications are not sensitive to the affect
of missing memory barriers.
However, if the application is to be ported to high-end processors, the omission of memory
barrier instructions might result in glitches in the applications. The use of memory barriers can
also be important if the software will be ported to a system with multiple processors. For
example, when handling semaphores in a multiple processor system, memory barrier
instructions should be used to ensure the other processors in the system can observe the data
changes in the correct order.
ARM recommends software developers to develop software based on architectural
requirements rather than processor specific behaviors. This ensures portability and reusability
of the software code. Processor specific behaviors might also vary between different released
versions.
3.5 Cortex-M processor implementation specific requirements
The Cortex-M processor type can affect memory ordering.
Cortex-M3 and Cortex-M4 implementations
The Cortex-M3 and Cortex-M4 processors implement the ARMv7-M architecture. Both
processors have three stage pipelines, an optional MPU, and have multiple AHB Lite interfaces.
Code that follows the ARMv7-M barrier and ordering rules is guaranteed to function on
Cortex-M3, Cortex-M4 and all other ARMv7-M compliant implementations. The Cortex-M3
and Cortex-M4 ordering model is at the simpler end of a permitted ARMv7-M implementation
and so, although ARM recommends that barriers are always implemented, it is possible to
implement software without barriers in a number of cases because:
• all loads and stores always complete in program order, even if the first is buffered
• all Strongly-ordered load/stores are automatically synchronized to the instruction stream
• at most, one instruction plus a folded IT instruction executes at one time
• interrupt evaluation is performed on every instruction, except on the folded IT
• interrupt evaluation is performed between a
CPSIE
instruction and a following
CPSID
instruction
• NVIC side-effects complete less than two instructions after an NVIC load or store
• the maximum prefetch size is six instructions, plus a fetch of two instructions in progress
on the bus
• Device and Normal load/stores may be pipelined
•
MSR
side-effects are visible after one further instruction is executed.