1 Introduction
H.264 is based on the previously developed coding standards previously developed by ITU-T and ISO / IEC, which is the same as that of most video compression standards in the world, such as H.264, H.263, MPEG-2. The MPEG-4 is a mixed encoding technique adopted by a block-based discrete cosine transform and quantization. Based on the blockage discrete cosine transformation has a high compression ratio, low computational complexity. Easy to achieve advantages. H.264 has the following features: 50% size ratio than H.263 + and MPEG-4 (SP); strong adaptability to channel delays; improve error recovery ability; complexity can be graded, to accommodate different The application of complexity; introducing advanced technologies, including 4 × 4 integer transformation, intra-airspace prediction, 1/4 pixel precidity motion estimation new technology brings a higher coding ratio while greatly improving the complexity of the algorithm. Therefore, H.264 technology has been widely used in the codec equipment of HD video.
Entropy decoding, inverse quantization, reverse transformation, intra prediction, frame brightness interpolation, intra chroma interpolation, interframe filtering, etc. It is of great significance. On the DSP-BF533 platform, this paper uses software flows, and proposes a new type of optimized design for synergistic work between software modules.
2 H.264 decoder principle
H.264 Encoder Structure System consists of the following components: network data extraction layer (NAL), VAL buffer, entropy decoding, reverse scanning inverse reflection, inter prediction, intra prediction, image reference frame buffer, Square filter, as shown in Figure 1. The NAL unit data is first obtained from the code stream, and the sequence parameter set, image parameter set and image data are analyzed by RBSP. Store data and parameters in the VCL buffer and entropy decoded in the video coding layer (VCL Table). Entropy Decoding Module (VLD) parses all parameters and reference image indexes, providing various control information and residual data. By reverse quantization reverse change to convert one-dimensional data into a two-dimensional array or matrix, then map the transformation coefficient quantization value sequence to the corresponding coordinate by reverse scanning procedure, mainly inverse Zig_ZAG scanning and arverse field scanning. After reading the data reading and judgment, intra prediction and inter prediction, composite all predicted and reverse transform intrinsicized data, and finally perform block filtering, which can greatly reduce the block effect due to prediction, quantization, thereby Get better subjective image quality and objectivity. At the same time, the recovered image can also be selected as a reference frame for subsequent processing images.
Be
3 DSP-BF533 decoder design and optimization
3.1 Decoder software design block diagram
The decoding process of integrated DMA is designed based on the characteristics of the memory controller (DMA) of the DSP-BF533, as shown in Figure 2. Adding two steps related to DMA to a normal decoder, step 1 is read from the outer memory; step 2 is to output the processed data to the outer memory.
Be
The specific process can be seen from Figure 2: 1 for the top data segmentation of the next macroblock, dividing the data prior to the residual data. At the same time, provide intra prediction, reference image index, and vector; 2 Start DMA reading divided data, which also read the decoded reference image index and vector; 3 to perform intra prediction of image data; 4 Using the bottom portion cut The mapping data is made inverse transform and reverse quantization; 5 is built by filtering; 6 outputs image data into the sheet and the in-chip memory via DMA; 7 for the next macroblock, then the bottom data segmentation, then remove the map data A macroblock decoding use is driven.
In order to avoid the DSP core waiting for the DMA reader data, the decoded data is divided into top data and bottom data from the macroblock in advance, and the top data includes the data before the residual data, and the remaining data is the bottom data. If there is a P frame to, the data has been split in advance, then the DMA is started. When the DSP kernel is decoded the current macroblock, the DMA reads the next macroblock. If the data decoding is completed, the DMA can be input to the chip memory by DMA after the current macroblock reference data needs to be utilized. Since the current macroblock top data does not have a reference value for the filtering of the next macroblock, the top of these macroblocks is transmitted to the external memory by DMA. The first macro block does not enter the decoding process because a series of reference images and parameters are not set, so the first macro block is only the set decoder reference image and the parameter row initialization, which is the next macroblock. Decode. Data readings of macroblock data and DMA data read can be executed in parallel in decoding, that is, when executing the current macroblock, the next macroblock requires the required parameters and reading the decoded data, which reduces the waiting time between each module. ,Improve work efficiency. The above-described parallel execution process is represented in an elliptic box in Figure 2.
3.2 Software flow new algorithm
In many design, decoding parameters preparation, decoding, and DMA data outputs are executed in order, and the design is organized in parallel execution, which fully utilizes DSP-BF533 instruction parallel execution characteristics, and reduces software modules. Waiting time between.
The following is a 4 × 4 macroblock matrix as an example, first give the 4 × 4 matrix mark 4 line coordinates, then divide the program processing into five phases. The state corresponds to 1, 2, 4, 8, respectively. 16, so that the state machine operation is listed in Table 1. CAVLC is a process of resolving read data and a subsequent image integration and reconstruction providing parameters and reference images, HL_Decode is a high-grade decoding process, ie, based on the process of reconstructing an image based on the ready condition. DMA is a transfer process for decoded data. Control Sheets 1 and Table 2 Analyze: When the new frame image arrives, the current status label is 1, only CAVLC execution; when running to the coordinate is x = 1, Y = 0, enter the second state, current The status label is 2, CAVLC and HL_Decode parallel; when running to the coordinate x = 1, y = 1, enter the third state, the label is 4, 3 modules, and the coordinate Y>4, enter the first 4 states, labels 8, only HL_Decode and DMA execution, CAVLC has completed preparation before decoding of all macroblocks; then determines X>0, enter the 5th state. The label is 16, and only the DMA module is run.
Be
Therefore, the decoding of the first macro block is in state 1, then continuous four macro blocks are state 2, and then 11 macro blocks enter state 3, then one macro block is state 4, and finally 3 macroblocks enter state 5.
If it is assumed that the execution time A, HL_DECode execution time B, the execution time of the DMA is executed, the general algorithm is executed, the total time T = 16A + 16B + 16C; the method time proposed in this article T2 = A + 16B + 3C, therefore, obvious Shorten the program execution time.
4 test results
Test Claire.cif and Pairs.cif on the DSP-BF533 test platform, from the results of the test analysis: Optimized results increase the decoding rate to achieve real-time application requirements. The result is as listed in Table 3.
Be
5 Conclusion
For mobile video terminal applications, according to the characteristics of the DSP, a new type of software water production algorithm is proposed, so that the module asks more closely, better utilization of the spare time of the program, reducing the wait time, and improve the decoding rate. Experimental Test This program has reached real-time decoding requirements for the CIF image, further optimized to achieve higher and more reliable decoding efficiency, so that the DSP-BF533-based design is fully scaled to the wireless 3G network, digital TV, to IP network Different areas such as storage formats such as media. Technology area
Tektron supports Amazon (AWS) media service, providing quality assurance for end-to-end video
IMEC is about to shock the first short-wave infrared (SWIR) band hyperspectral imaging camera
4K super high-definition home theater projector brings HD experience, full of fun
Video display system design based on unified calculation architecture technology
Apple TV 4K dismantling report: familiar modular components
Our other product: