1 understand thumb-2
First, let us start discussing energy-saving technologies - instructions from a starting point that looks unaptim. All Cortex-M CPUs use a Thumb-2 instruction set, which combines 32-bit ARM instructions and 16-bit Thumb instruction sets, and provides flexible solutions for raw performance and overall code size. A typical thumb-2 application on the Cortex-M core is less than 25% compared to the same feature application that is completely used by ARM instructions, while the execution efficiency reaches 90% (when performing the runtime Optimized).
Thumb-2 contains many powerful instructions that can effectively reduce the number of clock cycles required for basic operations. Reducing the number of clocks means you can now complete your work with fewer CPU power consumption. For example, suppose to complete a 16-bit multiplication operation (as shown in Figure 1). The operation of this operation will take 48 clock cycles on an 8-bit 8051 kernel MCU, and occupy a 48-byte Flash storage space. The same operation requires 8 clock cycles using a 16-bit core MCU (for example, C166), and occupies the 8-byte Flash storage space. In contrast, the same operation is required in the Cortex-M3 core using the Thumb-2 instruction set, and only one clock cycle is required, and the 2-byte Flash storage space is occupied. The Cortex-M3 kernel can save energy consumption by using less clock cycles, which can also reduce the number of Flash memory access times by occupying very few Flash storage spaces, achieving the target of energy savings (other than this The smaller application code also enables the system to select a smaller Flash memory to further reduce the overall system power consumption).
2 Interrupt Controller Energy Saving Technology
The interruptive controller in the Cortex-M architecture also plays a key role in reducing CPU power consumption. Previous ARM7-TDMI needs "up to" 42 clock cycles, Cortex-M3 NVIC occurs from the interrupt request to execute the interrupt processing code only requires only 12 clock cycles to transition time, which obviously improves the CPU execution efficiency, lowered CPU time Waste. In addition to faster access to the interrupt handler, NVIC also switches between interrupts more efficient.
In the ARM7-TDMI kernel implementation, you need to take several clock cycles to return from the interrupt handler to the main program, then enter the "Push-AND" in the next interrupt handler. -pop) "The operation will consume up to 42 clock cycles. Cortex-M Nvic achieves the same task as "tail-chaining". This method can be allowed using only 6 clock cycle processing to get the required information of the next interrupt service program. Using the end chain, no complete stack and out of the stack cycle are required, which makes the number of clock cycles required by the management interrupt process by 65% (shown in Figure 2).
3 Memory energy saving precautions
The memory interface and memory accelerator can significantly affect CPU power consumption. The branches and jumps in the code may generate a refresh impact on the pipeline for the CPU to provide instructions, in which case the CPU needs to delay several clock cycles to wait for the pipeline to re-complete the fill. In the Cortex-M3 or Cortex-M4 kernel, the CPU is equipped with a 3-level pipeline. Refreshing the entire pipeline will cause the CPU to delay 3 clock cycles, if there is a Flash memory waiting state, the time will be longer to complete the re-filled process. These delays are completely wasting power consumption, without any function. To help reduce the delay, the Cortex-M3 and M4 cores include a function called a speculative fetch, that is, it also refers to the possible branch target while refining the branch in the pipeline. If possible branch target hits, the speculation refines the delay to reduce the delay to 1 clock cycle. Although this feature is useful, it is obvious that many Cortex-M product suppliers have added their own IP to enhance this capabilities.
For example, even in the popular ARM Cortex-M mcU, the instruction buffer is also different. With simple instruction buffering MCUs, such as EFM32 products from Silicon Labs, most current execution instructions of 128x32 (512 Bytes) can be stored (whether or not the command address of the logic judgment request is in the buffer). The EFM32 Reference Manual indicates that typical applications will have more than 70% of the hit rate in this buffer, which means that very few Flash access, faster code executes speed and lower overall power consumption. In contrast, the ARM MCU using a 64X128-bit branch buffer can store the initial instructions (depending on the 16-bit or 32-bit instruction mix, each branch is more than 8 instructions, less than 4 instructions). Thus, the branch buffer implementation can eliminate any CPU clock cycle delay or waste of any CPU clock cycle delay or waste. Two buffer techniques offer considerable performance improvement and power reduction than CPUs with the same type without buffer characteristics.
4 M0 + core exploration
Every nano-watt is important for power consumption sensitive applications, and the Cortex-M0 + core is an excellent choice. M0 + Based on the von-neumann architecture (while the Cortex-M3 and Cortex-M4 kernels are Harvard structure), this means that it has fewer power consumption, and only loss of minimal performance (Cortex- M0 + 0.93DMIPS / MHz contrast Cortex-M3 / M4 1.25dMIPS / MHz). It also uses a smaller subset of thumb-2 instruction sets (as shown in Figure 3). Almost all instructions have 16 operating code (52x16-bit opcode and 7x32 bit opcode; data operations are 32-bit), which makes it possible to achieve some interested features to reduce CPU power consumption.
Energy Saving Features Options Premier Action is to reduce the number of Flash storage access. A main 16-bit instruction set means that you can alternate clock cycle to access the Flash memory (as shown in Figure 4), and you can acquire two instructions for the pipeline in each Flash storage access. Suppose you have two instructions in the memory and align one 32-bit word; Cortex-M0 + will disable the remaining half of the bus in the case where the instruction is not aligned.
In addition, the Cortex-M0 + core can also reduce power consumption by reducing the two-stage pipeline. In the usual pipeline processor, the next instruction is taken out when the CPU performs the current command. If the program generates a branch, the power consumption used to take the finger (branch shadow buffer) is wasted. In the two-stage pipeline, this branch shadow buffer is reduced, so energy consumption is saved (although only a small amount), this means that when the pipeline is refreshed, only one clock cycle can re-fill the flow water line (eg Figure 5 shown in Figure 5).
5 utilize GPIO port energy saving
Another place to provide energy-saving features is its high-speed GPIO port in the Cortex-M0 + kernel. In the Cortex-M3 and Cortex-M4 kernels, the process in which one or GPIO port is "read-modified - write" a 32-bit register. Although Cortex-M0 + can also use this method, it has a dedicated 32-bit width I / O port that allows you to access GPIO with a single clock cycle such that it can efficient counter / pin reversal. Note: On Cortex-M0 +, this is an optional feature, not all suppliers have this useful GPIO feature.
6 CPU sleep mode
One of the effective methods of reducing CPU power consumption is to close the CPU itself. There are a variety of dormant modes in the Cortex-M architecture, each of which is crushed between power consumption and re-execute code start time (as shown in Figure 6). It also allows the CPU to automatically enter a sleep mode after completing the interrupt service, without having to perform any code to complete this work. This method can save CPU clock cycles for tasks common in ultra-low power applications.
In depth sleep mode, you can also use a wake-up interrupt controller (WIC) to alleviate the NVIC burden. When using the WIC, the external interrupt is awakened in the external interrupt in the low power mode, there is no need to provide the clock for NVIC.
7 autonomous peripherals can reduce CPU load
The peripheral peripherals have the advantages of reducing power consumption. Most MCU vendors have realized autonomous interactions between peripherals, such as Silicon Labs EFM32 MCUs, for example. Peripheral Reflectance System (PRS). Autonomous peripherals can achieve a very complex peripheral action chain (trigger rather than data transmission) while maintaining the CPU in sleep. For example, the PRS function on the EFM32 MCU can be configured to be configured in the low power mode of the CPU sleep, which triggers a timer to start an reduction in a timer to detect the voltage value exceeding its preset threshold. . When the timer reaches 0, trigger DAC to start the output - the CPU can keep sleep during all event generation.
This complex interaction is automatically made, which allows the peripherals to complete a lot of work without the need for CPU participation. In addition, a peripheral (e.g., sensor interface or pulse counter) with built-in intelligence can be used to interrupt wake-up CPUs by preset conditions, such as interrupting the CPU when accumulating 10 pulses. In this example, when the CPU is awakened by a particular interrupt, it explicitly know what needs to do, without having to check the counter or register to discriminate what happens, so there is a considerable clock cycle, better completing other important tasks.
We have already introduced a variety of easy-to-implement methods for mitigating CPU power consumption on Cortex-M devices. Of course, there are other factors affect power consumption, such as processing processes for processing equipment or memory technology for storing application code. Process and storage techniques can significantly affect long-term power consumption and low power mode, so it should also be included in the overall power consumption design of embedded developers.
Be
Source: Wiku Electronic Market Network
Our other product: