The arrival of the mobile Internet era, the popularity of HD multimedia video, the emergence of 3D large mobile games, single-core embedded hardware platforms have been difficult to meet complex actual calculation needs. The heterogeneous multi-core processor has a powerful advantage in video codec operations, has become the trend of the development of embedded processor architecture. Currently, high-definition video decoding is used to cooperate with DSP within the heterogeneous multi-core processor, and the core multimedia data transmission is implemented by the on-chip communication mechanism. DSP has been improved in speed and performance than soft decoding, such as the Davinci platform built-in DSP can implement 720p video real-time decoding. However, the DSP is running to configure the mailbox and the DMA, which occupies more on the on-chip communication bandwidth, resulting in low interpoxical efficiency, and the DSP codec is low compared to the hard-wring decoder. In order to further improve the full HD H264 codec performance, this paper uses TI SoC OMAP4430 heterogeneous multi-core processor as a processing platform, its maximum difference is the built-in dual-core Cortex-A9 strong processor, dual-core Cortex-M3 coprocessor and IVA-HD multimedia hard Code Acceleration Engine. There are 7 acceleration engines designed for various video codes, and each acceleration engine has a separate data memory that minimizes competition between modules due to read and write data. At the same time, the Virtio Cache Queue and RPMSG Message Framework are used to implement main process core A9 based on asynchronous notifications, and the data communication between data communication, high data communication efficiency, asynchronous notification, and so on. The Cortex-A9 dual-core processor inside the OMAP4430 processor will run the advanced embedded operating system Linux, which is responsible for the scheduling, audio decoding, user interface interaction of the system work task, and the internal Cortex-M3 inside will act as auxiliary process, manage IVA- The HD acceleration engine completes the decoding task and finally verifies the correctness of this design with an instance.
1 main technology
1.1 Virtio Cache queue
Virtio is an abstraction layer located on the device in the semi-virtualization Hypervisor, providing the minimum layer of the heterogeneous multi-core data communication. It uses two cache queues based on asynchronous notifications (one for transmitting data to coprocessing cores, one for use in coprocessing core reception data) and hash tables for data communication with the remote heterogeneous processor. Each cache queue contains up to 512 caches, and each cache is limited to 512 bytes, and communication data is stored in the buffer pool. In order to minimize shared memory, the annular hash table is used, the hash table includes the size of the cache, the size of the cache, has a hash table stores in a memory-specific address, and the main process of processes the core and the co-processing core based on the mutex mechanism Shared memory methods are visited, as shown in Figure 1:
Figure 1 Schematic diagram of the heterogeneous multi-core access Virtio cache pool
There are several aspects of using a shared annulus table for heterogeneous treatment of nuclear data communication:
1) The use of the hash table entry indicates that the data cache can reduce the size of the shared memory area, improve the system memory usage, while allowing the growth of data transmission.
2) Using interrupt mode to notify the purpose of the destination list, reduce the processor blind waiting time, improve the utilization of the processor
3) Allow multiple cache data at the same time, improve the throughput of system communication
1.2 RPMSG message framework
RPMSG (Remote Processor Messaging) is a message framework for processor core data communication based on Virtio technology, providing coordinating nuclear power reset management, message communication and other functions.
1.2.1 Cooperation Processing Nuclear Reset Management
Mainly responsible for the load program executives to the operation of the coprocessing core, setting responsible for virtual addresses to physical address MMU units, when association processing, error or internal code exception, requires output of intuitive error information and provides a recovery mechanism Make the coordinating core to reuse.
1.2.2 Message Communication
The RPMSG message framework is a message communication framework based on the main process core and coprocessing core implementation of the Virtio Cache queue. RPMSG registers a message bus to the system and the corresponding bus device for each M3 collar, and multiple customers The end driver is also registered on the message bus and assigns a local address port SRC and remote address port DST. When the client driver needs to send a message, the message is packaged into a Virtio cache and adds to the cache queue to complete the message. When the message bus receives the cancellation of the coprocessor, the client driver is sent according to the message port DST. The schematic is shown in Figure 2:
Figure 2 Schematic of RPMSG Message Bus Work
1.3 IVA-HD Acceleration Engine
H.264 / MPEG-4 Part 10 is a highly compressed digital video codec standard proposed by the ITU-T video encoding expert group and ISO / IEC moving image expert group (MPEG), which is widely used in network streaming media resources. HDTV and other aspects. Compared to the previous MPEG4, H263 and other standards, H.264 has the characteristics of low yard ratio, high-quality, high compression ratio and high reliability, and is suitable for channels of severe interference and high packet loss rate.
The H264 decoding process is shown in FIG. 3, the decoder receives the input data frame from the network abstraction layer NAL, entropy decoding, and re-arranges the quantization coefficient matrix X, quantization coefficient matrix is calculated after firing and spatial conversion The residual DN, and the prediction fast Pn is obtained by motion compensation and inter prediction or intra prediction, the PN and DN addresses the result UFN to obtain the output cache image Fn through the loop filter.
Figure 3 H264 decoder workflow
The IVA-HD engine is a third-generation hardware acceleration engine designed for multimedia codec acceleration for embedded platforms, which supports H264, MPEG4, MPEG2, H263 and other common video codec standards. In order to release the CPU, it makes it more efficient to perform data preparation and logic function control, IVA-HD integrates seven hardware acceleration engines, and their and H264 decoded the correspondence of each function module in FIG. 3 is represented by a dashed box in FIG. The module functions corresponding to the names Core1-5 are: entropy decoding, inverse quantization, and reverse transform, loop filtering, intra prediction, motion compensation. , Reading the full text, the technology area
H264 codec based on Ti SoC OMAP4430 heterogeneous multi-core processor
Bible Based on ARM Cortex-M4 kernel XMC4000 microcontroller
Principle Analysis of Semiconductor Laser Controller Circuit
Cortex product line worth paying attention after ARM11
Talking about the three modes of the Zynq QSPI controller
Our other product: