"Keywords:
AVS, DSP, video encoder
With the development of digital video technology, many digital audio and video compression standards have emerged in the world in recent years. AVS (audio video coding stand ard) is an audio and video coding standard independently formulated by China with independent intellectual property rights. Compared with other well-known audio and video coding standards in the world, it has the following characteristics: ① high performance, coding efficiency is more than 2 times higher than MPEG2, which is equivalent to the coding efficiency of H.264; ② The complexity of the algorithm is lower than that of H.264; ③ The implementation cost of software and hardware is lower than that of H.264; ④ The patent authorization mode is simple and the cost is significantly lower than that of similar standards. When the bit rate is the same as PSNR, the coding speed of AVS is more than 4 times that of H.264.
AVS video standard adopts a series of technologies to achieve efficient video coding, including intra prediction, inter prediction, transformation and quantization, entropy coding and so on. Frame prediction uses block based motion vectors to eliminate image redundancy; Intra prediction uses spatial prediction mode to eliminate redundancy in the image; Then, the visual redundancy in the image is eliminated by transforming and quantifying the prediction residual; Finally, the motion vector, prediction mode, quantization parameters and transform coefficients are compressed by entropy coding to eliminate the redundancy of coded codewords.
The implementation of DSP is an important field of AVS hardware application, and real-time is an important requirement; However, due to the short time of the standard, there are few examples of DSP implementation. It is of great significance to the development of AVS that av5 algorithm can be implemented on DSP. In addition, DSP with strong processing ability is very suitable for application in the field of communication and image processing.
The system selects the digital media processor TMS320DM6446 (hereinafter referred to as "DM6446") newly launched by TI company. Its main frequency is up to 594mhz, and has a rich instruction set optimized for multimedia operation, including integrated multimedia and communication peripherals that can simplify the design and reduce the system cost. The on-chip integrated ARM926EJ-S core based on ARM9 (main frequency up to 297 MHz) and rich media and peripheral interfaces provide a good hardware foundation for AVS video codec scheme.
1 system hardware design
The system is a video monitoring system based on DSP. The YUV 4:2:0 signal obtained by CCD camera is processed in real time through DSP, and the compressed data stream is sent to the monitoring room through Ethernet interface.
The data compression unit is mainly realized by DSP and SDRAM. The system hardware structure block diagram is shown in Figure 1.
DM6446 adds many peripherals and interfaces. For example:
◇ video processing subsystem VPSs (video processing subsystem), including CCD equipment interface;
◇ expand memory interface EMIF (external memory inter faces);
◇ FPGA interface (vlynq interface);
◇ Ethernet interface 1o / 100Mbps EMAC (Ethernet MAC).
After the video signal is collected, EDMA method is used for data movement; After moving to the cache, DM6446 compresses the data. Figure 2 is the software flow chart of the system.
After buffering 1 frame, DSP reads the data through EDMA and compresses the data. The results are stored in SDRAM through EDMA. When the DSP processes 1 frame, it notifies the host to read the data; After receiving the notification, the host reads the compressed data into the host memory in the form of EDMA through Ethernet and saves it in the hard disk. AVS Decoder client is installed on the monitoring host, and the transmitted data can be played in real time on the host. The above process is executed circularly. During the execution, the relevant parameters can be automatically adjusted according to the video bit rate.
2 system software design
2.1 AVS video compression principle
The system adopts AVS video standard, and the principle block diagram is shown in Figure 3.
In AVS video standard, all macroblocks need intra prediction or inter prediction. The prediction residual needs to be 8 × 8 discrete cosine transform (DCT) and quantization, then scan the quantization coefficients to obtain one-dimensional arranged quantization coefficients, and finally entropy code the quantization coefficients. AVS uses loop filter to filter the reconstructed image, which has the following advantages: on the one hand, it can eliminate the block effect and improve the subjective quality of the reconstructed image; On the other hand, it can improve the coding efficiency. The filter strength can be adjusted adaptively.
2.2 main technologies of AVS
(1) Transform and quantization
Considering the coding performance, implementation complexity, main applications of AVS video standard and other factors, 8 is finally selected as the AVS video standard × 8 discrete cosine transform. In AVS, 8x8 integer cosine transform technology with pit (pre scaled integer transform) is adopted, that is, forward scaling, quantization and reverse scaling are combined, while the decoder only performs inverse quantization without reverse scaling. AVS 8 × 8 transform quantization can be realized without mismatch in 16 bit accuracy.
For PC, a large number of multiplication and addition operations in DCT are generally realized by addition and shift. However, for the tms32013m6446 of this system, multiplication and addition can be completed in one cycle by reasonably arranging the pipeline. It is absolutely unnecessary and should not use a lot of addition and shift operations to reduce multiplication. The multiplication of the integral power of 2 should still be realized by shift, because the power consumption of shift operation is lower than that of multiplication operation.
(2) Intra prediction
AVS video standard uses intra prediction technology to improve the coding efficiency of intra coding macroblocks. During prediction, adjacent pixels in the left block and upper block of the current block are used as reference pixels. The intra prediction of AVS video standard is based on 8 × 8 brightness block and chroma block as units, 5 kinds of 8 are defined × 8 brightness block prediction modes and 4 kinds of 8 × 8 chroma block prediction mode (see Table 1 and Fig. 4) greatly simplifies intra prediction.
(3) Inter prediction
AVS supports two kinds of inter prediction images: P frame and B frame. At most two forward reference frames are used in P frame, which can improve the coding efficiency without increasing the buffer size; B frame adopts one reference frame before and after.
The size of motion compensation block in AVS video standard includes 16 × 16、16 × 8、8 × 16、8 × 8, etc. The accuracy of motion vector is 1 / 4 pixel. Interpolation is needed to obtain non integer samples. AVS video standard defines two 4-tap FIR filters for interpolation of L / 2 and 1 / 4 luminance samples respectively. Compared with the 6-tap FIR filter used in H.264, the filter implementation complexity of AVS video standard is lower.
(4) Loop filtering
Block based video coding is easy to cause block effect, especially at low bit rate. AVS video standard defines adaptive loop filter to eliminate block effect, improve the subjective quality of reconstructed image and improve coding efficiency. Loop filtering is performed on the boundary of luminance block and chrominance block. When filtering, first filter the horizontal boundary of the block, and then filter the vertical boundary of the block. The filtering strength is determined by macroblock coding mode, quantization parameters and motion vector. H. The loop filter of H.264 uses 4 pixels on the left and right of the boundary, while the AVS video standard only uses 3 pixels on the left and right, so the complexity of the loop filter is lower than that of H.264. The loop filter used in AVS video standard is also more conducive to the parallel implementation of DSP.
(5) Entropy coding
AVS video standard uses k-order (k = O ~ 3) exponential Columbus code. CBP, macroblock mode and motion vector are encoded by 0-order exponential Columbus code. All four exponential Columbus codes are used for quantization coefficients, and the (run, level) of quantization coefficients are encoded by 2d-vlc coding method. The codeword structure of exponential Columbus code is very regular, and the decoder does not need to store the code table. The storage space required for 19 mapping tables used for quantization coefficients is less than 2 kb. The video standard also defines a new escape coding method, which can obtain a coding gain of O.05 ~ o.08 dB.
2.3 program optimization on DSP
The compression part of the system is based on the AVS reference code rm52f as the source code of the coding part. According to the characteristics of AVS coding algorithm and DSP, its structure and algorithm are adjusted and improved. Optimize the program as follows:
① Reasonably set the structure and variable types. Frequently used array variables are not placed in the structure, otherwise double-level addressing is required to reduce efficiency; The variable maximum length shall be reasonably defined, and the variable allocation of 8-bit, 16 bit and 32-bit shall be strictly and reasonably distinguished. If it can be used, the small one shall not be used( Note: in the loop body, the loop count variable should always use int type, i.e. ship position, rather than short type)
② Loop expansion. Too many and too deep loops are not conducive to software pipelining optimization by the compiler and affect DSP parallel processing. Therefore, properly disassembling the inner loop according to the characteristics of DSP can enable DSP to execute multiple instructions in one cycle. A better way to optimize the loop is to extract the loop as a separate file, rewrite, compile and execute it separately. Since the inner loop is the only loop that can carry out software pipelining, the following problems should be paid attention to (otherwise the loop will not carry out software pipelining and seriously affect the performance): ① inline functions can be included, but function calls cannot be included; ② Unconditional termination and early exit instructions are not allowed; ③ The count must be decremented and terminated at O (automatic conversion of - O2, - O3 can be used); ④ The cycle count value cannot be modified in the cycle body.
③ Eema mode is used for data movement, which can greatly save CPU resources when CPU frequently accesses external memory data. It mainly realizes the following data transmission: video data is transferred from off chip memory to on-chip cache; The encoded data is transmitted from the on-chip to the off-chip storage; During motion compensation, the corresponding reference block data outside the chip is transmitted to the on-chip.
④ Use inline functions and linear assembly. DSP provides many very useful inline functions. Using these inline functions can greatly improve the running speed of the program. For the most time-consuming motion estimation and DCT transformation, linear assembly can greatly improve the execution efficiency of the program. Compared with standard assembly, linear assembly does not need to consider parallel instruction arrangement, instruction delay, register use and function unit use, which can greatly shorten the time of writing code, and the efficiency is much higher than that of C program.
⑤ Use compilation to select the top. The highest level of software pipeline optimization can be carried out through the setting of parameters -o3; You can tell the compiler that the source program does not use aliasing technology through the parameter - MT, so as to improve the effect of compiler optimization; The compiler can optimize program level code by setting the - PM parameter. When using c64xx DSPs, the - mv6400 compilation option should be used to optimize these DSPs at a higher level.
⑥ Using fast algorithms. In AVS Encoder, motion estimation takes a lot of time. Optimizing the search order of motion estimation and adopting adaptive search strategy can greatly improve the speed of motion estimation, such as using fastme for optimization. In addition, some adjustments can be made in 1 / 4 pixel interpolation to avoid repeated calculation.
3 Summary
The system well realizes the real-time compression processing and transmission of video data, and can realize image data reading and writing, memory reading and writing, SDRAM reading and writing, configuration space reading and writing and register reading and writing. At the same time, several operations are coordinated to realize AVS compression of image data. The system can complete 4-way CIF format (352) × 288) real time video coding with reserved resources for performance expansion. Taking the CIF format test sequence bus as an example, the compression results of the system: when the QP is set to 36, the code rate is 952.77 Kbps, the SNR (signal-to-noise ratio of brightness signal) is 30.80 dB and the coding speed is 36 FPS. It can be seen from the results that for the video monitoring system, the PSNR (peak signal-to-noise ratio) index is ideal, and the coding speed also meets the real-time requirements. With the continuous improvement of AVS video coding technology, the system can be easily upgraded. It will be widely used in video conference and other fields, and has great development potential“
Our other product: