ARM Cortex-M4 Architecture
The Cortex-M4 processor is a low-power processor that features low gate count, low interrupt latency, and low-cost debug.
Cortex-M4 is the workhorse of the mid-range microcontroller world: STM32F4, nRF52, Kinetis K, MAX32, SAM4 — all the same core, different peripherals. This post is a tour of the core itself: what's inside, how memory is laid out, and how it handles exceptions.
Key features at a glance
- 32-bit RISC core with a Harvard-ish bus structure
- Thumb-2 ISA — mix of 16-bit and 32-bit instructions, dense code, no mode switching
- Single-cycle 32×32 multiply, 2–12 cycle hardware divide
- DSP extensions — SIMD, saturating arithmetic, single-cycle MAC
- Optional FPU — single-precision IEEE 754 (the "F" in Cortex-M4F)
- 3-stage pipeline (fetch / decode / execute) with branch speculation
- NVIC — up to 240 external interrupts, 8 to 256 priority levels
- Optional MPU — 8 region memory protection unit
- 24-bit SysTick down-counter — every RTOS uses this for the tick
- WIC (Wakeup Interrupt Controller) for ultra-low-power sleep
- Debug: SWD/JTAG, DWT for hardware breakpoints/watchpoints, ITM for printf-tracing, optional ETM for full instruction trace
Block diagram
The diagram below — from the ARM Cortex-M4 Technical Reference Manual — shows the major blocks and how they connect.
The standard memory map
One of the strengths of Cortex-M is its fixed memory map. Every chip from every vendor uses the same regions for the same purposes — that's why CMSIS code is portable.
| Address range | Purpose |
|---|---|
0x0000_0000 – 0x1FFF_FFFF | Code (Flash) — vector table lives at the bottom |
0x2000_0000 – 0x3FFF_FFFF | SRAM — bit-band region in the lowest 1 MB |
0x4000_0000 – 0x5FFF_FFFF | Peripheral — vendor SoC peripherals (USART, SPI, GPIO…) |
0x6000_0000 – 0x9FFF_FFFF | External RAM (FSMC, QSPI, etc.) |
0xA000_0000 – 0xDFFF_FFFF | External devices |
0xE000_0000 – 0xE00F_FFFF | Private Peripheral Bus — NVIC, SCB, SysTick, MPU, debug |
Operating modes & privilege
Cortex-M has two execution modes and two privilege levels:
- Thread mode — where
main()and RTOS tasks run. - Handler mode — where exception/interrupt handlers run. Always privileged.
- Privileged — full access to all instructions and registers.
- Unprivileged — restricted; commonly used by RTOS to sandbox user tasks (combined with the MPU).
Stack pointers — MSP and PSP
Two banked stack pointers:
- MSP (Main Stack Pointer) — used in handler mode. Default in thread mode after reset.
- PSP (Process Stack Pointer) — typically used by RTOSes for task stacks, leaving MSP exclusively for ISRs.
The CONTROL register's SPSEL bit picks which one is active in thread mode.
Exception model
Cortex-M defines 15 fixed system exceptions plus up to 240 external IRQs. Numbers 1–15 are exceptions; 16+ are device-specific IRQs.
| # | Exception |
|---|---|
| 1 | Reset |
| 2 | NMI — non-maskable |
| 3 | HardFault — catch-all when other faults are masked |
| 4 | MemManage — MPU violation |
| 5 | BusFault — invalid memory access |
| 6 | UsageFault — undefined instruction, unaligned access |
| 11 | SVCall — supervisor call (used by RTOSes) |
| 14 | PendSV — deferred context switch (used by RTOSes) |
| 15 | SysTick |
The System Control Block (SCB)
The SCB groups registers that govern processor-wide behaviour: reset, sleep modes, interrupt vector base address, and configurable fault handling. The most-used registers:
SCB->VTOR— vector table offset; lets you relocate the vector table (essential for bootloaders).SCB->AIRCR— priority grouping config and the system-reset request bit.SCB->SCR— sleep behaviour (deep sleep, sleep-on-exit, send-event-on-pend).SCB->CPACR— coprocessor access; this is where you enable the FPU.SCB->CCR— configuration: stack alignment, divide-by-zero trap, unaligned trap.SCB->SHCSR— system handler control & state (enable MemManage/BusFault/UsageFault).
Enabling the FPU (Cortex-M4F only)
The FPU starts disabled out of reset. You must enable access to coprocessors CP10 and CP11 in CPACR, ideally inside SystemInit() before any C code runs that might issue a VFP instruction.
// Grant full access to CP10 & CP11 (FPU)
SCB->CPACR |= (0xF << 20);
__DSB();
__ISB();
Forget this and any FPU instruction — generated as soon as you compile with -mfpu=fpv4-sp-d16 -mfloat-abi=hard — will cause a UsageFault at the first float operation. Classic "works on M3, crashes on M4" bug.
Reference manuals
- ARMv7-M Architecture Reference Manual (DDI 0403) — the ISA, exception model, memory model. The bible.
- Cortex-M4 Technical Reference Manual (DDI 0439) — the specific implementation.
- Cortex-M4 Devices Generic User Guide (DUI 0553) — friendlier programmer's reference.
- Joseph Yiu's The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors — best book on the topic.