The network transmission section of the moving image remote real-time transmission system is above the Internet, and the situation in this internet is small, the delay is large, and unstable. Therefore, in order to obtain a good real-time transmission, in addition to improving the transmission control mechanism, it is necessary to achieve high compression ratio, low consumption, can achieve real-time compression and decompression effects. H.263 is the International Telecom Society-Telecommmunications Union-Telecommunication Standardization Selecommunication Standardization Sector in 1995 for a video codec protocol for low bit rate real-time transmission. Its design is to meet low bandwidth video applications with a width of 64kbps, such as video conferencing, videophone, etc. The H.263 is now applied to a moving image remote real-time transmission system, but the original H.263 has many optimized rooms in real-time and compression ratio. This paper is based on the specific moving image remote real-time transmission system application, and proposes a number of H.263 optimization strategies based on a large number of research work, and has achieved considerable effect.
Analysis of 1 H.263 Compression Algorithm
The input video frame format of H.263 is QCIF (Quarter Common Intermediate Format, the size is 176 × 144), the CIF (COMMON Intermediate Format, the size is 352 × 288), etc. Each video frame is divided into a plurality of macroblocks, each macroblock consists of 4 Y brightness blocks, 1 CB chroma block, and 1 CR chroma block. The block (block) is 8 × 8. H.263 performs compression of video frames in units of macroblocks.
H.263 uses discrete cosine transform DCT (Discrete Cosine Transform) to reduce spatial redundancy, using motion estimation and motion compension to reduce time redundancy. H.263 has two encoding methods, one is intra mode, intra code, generated frame as a critical frame -i frame; the other is an Inter-INTER mode, inter-frame encoding, generated frame as a non-critical frame -P frame .
By analysis, the flow chart of the H.263 compression algorithm is incorporated herein by reference.
By analyzing and testing, DCT, motion estimation and motion compensation is the most important part of H.263, and is also the most time consuming operational process in H.263 implementation. To increase the speed of operation of H.263, it is optimized for these links.
2 optimization of converting functions, DCT and motion estimation links
2. Optimization of color space conversion function
The CIF format is based on the YUV color space, and most of the video capture programs only provides video frames of RGB color space, so you need to establish a conversion function from RGB color space to YUV color space.
The conversion function of RGB to YUV is shown below, where Y is the brightness value of the YUV color space, U (CB), and V (CR) are the chrominance value of the YUV color space.
Y = 0.299 × R + 0.587 × g + 0.114 × B;
Cr = V = (r-y) × 127/179;
CB = u = (b-y) × 127/226;
H.263 The original color spatial conversion algorithm uses floating point operations, but the floating point operates consumes more CPU cycles. In order to speed up the video processing speed, use integer multiplication and right shift to replace the floating point multiplication, thereby effectively shortening the conversion time.
The optimized conversion function is as follows:
Y = ((R × 313524) "20) + ((G × 615514)" 20) × ((b × 119538) "" 20);
Cr = V = ((R-Y) × 743962) "" 20;
CB = u = ((b-y) × 589244)) "" 20;
2.2 Optimization of DCT, IDCT Algorithm
The two-dimensional DCT formula is:
Through analysis, the implementation of the DCT rapid algorithm can have two ways. One way is to map existing fast transform algorithms (such as FFT, FHT, etc.) to DCT calculations, which has more ways to increase the computational complexity; another method is from DCT transformation itself Improvement in law.
In H.263 applications, it is noted that two rules: First, the energy is concentrated on a small portion of the DCT coefficient; the second is increased as the quantization step is increased, and the DCT coefficient of zero is increased, and the accuracy of DCT calculations Requires low. Thus, a zero coefficient prediction strategy is used, that is, according to the quantization step, the input data of the DCT transformation is classified, and for a given quantization step, if the input data is quantified to 0, then the data does not have to do DCT operations. And directly put the transform results as 0. This only needs to perform DCT transform to some data, so a large amount of invalid operations saves. In addition, using the local parallelism of DCT, the multimedia processing command set of Intel is used to implement DCT calculations, which greatly improves the operational speed.
2.3 Optimization of Motion Estimation and Motion Compensation Algorithm
Motion estimation means searching for an image block that is most similar to the current frame image block in the reference frame, i.e., the best match block, and the search result is represented by motion vectors. Motion compensation means reconstructing the current frame using a reference frame and the obtained motion vector, and the difference between the reconstructed frame and the current frame is compressed as the compensation value of the current frame. The two mobilize each other to achieve compression effects.
Research on exercise estimation calculation from two aspects: Fast search algorithm and block matching criteria.
The simplest search algorithm is a full search method (FS), which is highly accurate, but the amount of calculation is too large. In order to speed up the computational speed, ensure accuracy, people put forward a lot of fast search algorithms: three-step method (TSS) and three-step improved algorithms, two-dimensional logs, cross search (CS), four-step method (4ss), predictive search method (PSA), diamond search method (DS), etc. Diamond search is one of the best performance optimal fast search algorithms to date for this project.
Block matching criteria decide when to find the best match block to terminate the search process. Traditional criteria have an absolute average error function (MAE), mutual correlation function (CCF), mean square error function (MSE), maximum error minimum function (MME), etc. Since the traditional method does not take into account the visual characteristics of the human eye, the judgment result is larger than the perception of the human eye. The actual H.263 block matching criterion is the alternative guidelines for MSE SAD (absolute difference), both of which are as follows:
Where: F0 and F-1 represent the current frame and reconstruction frame (reference frame), respectively; k, L is the coordinate of the macroblock in the current frame to be encoded; X, Y is the coordinate of the reference macroblock in the reconstruction frame; Indicates the size of the macroblock, here 16. As can be seen from the formula, SAD uses an absolute value to replace MSE's multiplier operation, which significantly reduces the amount of operation, so that the calculation speed can be accelerated.
Test shows that the amount of calculation of SAD is reduced by one third more than the amount of MSE, and their image effect is quite.
In addition, it is also possible to utilize the calculation speed of the hardware characteristic accelerator block matching criterion, and Intel's MMX technology provides this feature. SAD and other block matching guidelines mainly for short data repetition, MMX adds the number of single instructions for system (SIMD), so that multiple sets of data can be completed in one instruction, implement parallel mechanisms, thus speeding up.
3 improve the choice of compression ratio
H.263 provides many advanced modes to increase video compression ratio. From the perspective of the contribution to compression efficiency, large motion vector modes, advanced prediction modes, PB frame mode, and enhanced PB frame mode are the most important four high-level modes.
In the large motion vector mode and advanced prediction mode, the motion vector can refer to the image boundary, increasing the expression range of the motion vector, thereby improving the accuracy of motion compensation to improve the coding efficiency.
In the basic PB frame mode, a PB frame is a total of a P-frame and a B frame. The current P frame predicts the previous P frame, the B frame is predicted by the previous P frame and the current P frame (see Figure 2). In the case where the PB frame mode is increased, the frame rate is adjacent to the frame rate.
The main improvement point of enhancing the PB frame mode is the enhancement of the prediction method. Basic PB Frame Mode For the B frame image (or macroblock), only bidirectional prediction, while enhanced PB frame mode allows the B frame image to allow for forward prediction (see Figure 3), and then predict (see Figure 4) and Two-way forecasts (see Figure 2) three means. Thus, during the compression process, a more suitable prediction method is selected to process the B frame image (or macroblock), thereby increasing the compression efficiency of the B frame. The B frame of the basic PB frame mode can only be obtained by bidirectional prediction, which is better for slow moving images. When the input moving image has a fast irregular motion, the B frame quality will deteriorate sharply, and the B frame of enhancing the PB frame mode is available, and this problem can be solved. Through analysis and test, the enhanced PB frame mode has stronger robustness than the basic PB frame mode, and is more suitable for moving image remote real-time transmission.
Big sports vector mode and advanced prediction mode can enhance the accuracy of motion compensation due to increasing the scope of motion vectors, thereby increasing the compression ratio; and the enhanced PB frame mode introduces B frame, there are three prediction methods to generate B frame, in the same In the case of a frame rate, the compression ratio is increased by nearly 80%, and the compression effect is obvious. In actual programming, the author matches the transmission environment test module, and implements more compression efficiency and reaches a higher compression ratio when the network bandwidth is low.
4 experimental data and performance analysis
4.1 Algorithm optimization test
A video frame of 100 frames in three different formats (SUB-QCIF: 88 × 72, QCIF: 178 × 144, CIF: 352 × 288), each 20 frames take 1 keyframe, video frame quality takes 6000, compare optimization The time efficiency of the previous and optimized algorithm, as shown (Fig. 5).
The longitudinal axis unit is milliseconds, indicating the time required for compression completion. It can be seen that the larger the video frame to be processed, the more obvious the acceleration effect obtained after the optimized algorithm.
4.2 Enhance PB Frame Mode Compression Effect Test
A video frame of 100 frames in three different formats (SUB-QCIF: 88 × 72, QCIF: 178 × 144, CIF: 352 × 288), each 20 frames take one keyframe, video frame quality takes 6000, compared to use The compression efficiency of the algorithm after the reinforcing PB frame mode is enhanced, and the result is as shown (Fig. 6).
The vertical axis is a compression ratio. The larger the video frame to be processed, the more redundant information, the more obvious the compression effect of the enhanced PB frame mode.
Editor in charge: GT, read full text
Our other product: