H.264 video coding basic knowledge

"Basic knowledge of H.264 video coding" 1、 Development of video coding technology Video coding technology is basically the introduction of two series of international standards for video coding, mpeg-x formulated by ISO / IEC and H.26X formulated by ITU-T. From H.261 video coding proposal to h.262/3 and MPEG-1 / 2 / 4, there is a common goal, that is, to obtain the best image quality at the lowest bit rate (or storage capacity). Moreover, with the increasing demand for image transmission in the market, the problem of how to adapt to different channel transmission characteristics is becoming more and more obvious. Therefore, IEO / IEC and ITU-T jointly formulated the new video standard H.264 to solve these problems. H. 261 is the earliest video coding proposal, which aims to standardize the video coding technology in conference television and videophone applications on ISDN network. Its algorithm combines the hybrid coding method of inter prediction which can reduce temporal redundancy and DCT transform which can reduce spatial redundancy. Matched with ISDN channel, its output bit rate is p × 64kbit/s。 When the value of P is small, it can only transmit images with low definition, which is suitable for face-to-face television and telephone; When the value of P is large (e.g. P > 6), conference TV images with good definition can be transmitted. H. 263 suggests a low bit rate image compression standard, which is technically an improvement and expansion of H.261 and supports applications with a bit rate less than 64kbit / s. However, in essence, H.263 and later H.263 + and H.263 + + have developed into suggestions to support full rate applications. It can be seen from the fact that it supports many image formats, such as sub QCIF, QCIF, CIF, 4cif and even 16cif. The bit rate of MPEG-1 standard is about 1.2mbit/s, and 30 frames CIF (352) can be provided × 288) quality image is formulated for video storage and playback of CD-ROM disc. The basic algorithm of mpeg-l standard video coding is similar to h.261/h.263, and also adopts motion compensated inter prediction, two-dimensional DCT, VLC run length coding and other measures. In addition, the concepts of intra frame (I), prediction frame (P), bidirectional prediction frame (b) and DC frame (d) are introduced to further improve the coding efficiency. On the basis of MPEG-1, MPEG-2 standard has made some improvements in improving image resolution and compatibility with digital TV. For example, its motion vector accuracy is half pixel; Distinguish "" frame "" and "" field "" in coding operations (such as motion estimation and DCT); The scalability technologies of coding are introduced, such as spatial scalability, temporal scalability and signal-to-noise ratio scalability. The MPEG-4 standard introduced in recent years introduces the coding based on audio-visual object (AVO), which greatly improves the interactive ability and coding efficiency of video communication. MPEG-4 also adopts some new technologies, such as shape coding, adaptive DCT, arbitrary shape video object coding and so on. However, the basic video encoder of MPEG-4 still belongs to a kind of hybrid encoder similar to H.263. In short, H.261 recommendation is a classic work of video coding. H.263 is its development and will gradually replace it in practice. It is mainly used in communication, but many options of H.263 often make users confused. MPEG series standards have developed from the application of storage media to the application of transmission media. The basic framework of its core video coding is consistent with H.261. The eye-catching "object-based coding" part of MPEG-4 is still difficult to be widely used due to technical obstacles. Therefore, the new video coding proposal H.264 developed on this basis overcomes the weaknesses of both, introduces a new coding method under the framework of hybrid coding, improves the coding efficiency, and is oriented to practical applications. At the same time, it is jointly formulated by the two international standardization organizations, and its application prospect should be self-evident. 2、 H.264 introduction H.264 is a new digital video coding standard developed by VCEG (video coding expert group) of ITU-T and JVT (joint video team) of MPEG (moving image coding expert group) of ISO / IEC. It is not only H.264 of ITU-T, but also part 10 of MPEG-4 of ISO / IEC. The draft solicitation began in January 1998, the first draft was completed in September 1999, its test mode tml-8 was formulated in May 2001, and the FCD board of H.264 was adopted at the 5th meeting of JVT in June 2002. It was officially released in March 2003. H. Like the previous standards, 264 is also a hybrid coding mode of DPCM plus transform coding. However, it adopts the simple design of "return to basic" and does not need many options to obtain much better compression performance than H.263 + +; The adaptability to various channels is strengthened, and the "network friendly" structure and syntax are adopted, which is conducive to the processing of bit error and packet loss; The target area is wide to meet different speed, simultaneous interpreting and transmission needs. Its basic system is open and does not require copyright. Technically, there are many flash points in H.264 standard, such as unified VLC symbol coding, high-precision and multi-mode displacement estimation, based on 4 × Integer transformation of 4 blocks, layered coding syntax, etc. These measures make H.264 algorithm have high coding efficiency, and can save about 50% bit rate than H.263 under the same reconstructed image quality. H. 264 code stream structure has strong network adaptability, increases the ability of error recovery, and can well adapt to the application of IP and wireless networks. 3、 Technical highlights of H.264 1. Layered design H. The algorithm of 264 can be conceptually divided into two layers: the video coding layer (VCL) is responsible for efficient video content representation, and the network abstraction layer (NAL) is responsible for packaging and transmitting data in the appropriate way required by the network. A packet based interface is defined between VCL and nal. Packaging and corresponding signaling are part of nal. In this way, the tasks of high coding efficiency and network friendliness are completed by VCL and nal respectively. The VCL layer includes block based motion compensated hybrid coding and some new features. Like the previous video coding standards, H.264 does not include pre-processing and post-processing functions in the draft, which can increase the flexibility of the standard. Nal is responsible for encapsulating data using the segment format of the lower layer network, including framing, signaling of logical channel, utilization of timing information or sequence end signal, etc. For example, nal supports the transmission format of video on circuit switching channel and the format of video transmission using RTP / UDP / IP on the Internet. Nal includes its own header information, segment structure information and actual load information, that is, the VCL data of the upper layer（ If data segmentation technology is adopted, the data may be composed of several parts). 2. High precision, multimodal motion estimation H. 264 supports motion vectors with 1 / 4 or 1 / 8 pixel accuracy. In 1 / 4 pixel accuracy, a 6-tap filter can be used to reduce high-frequency noise. For motion vectors with 1 / 8 pixel accuracy, a more complex 8-tap filter can be used. In motion estimation, the encoder can also select "" enhanced "" interpolation filter to improve the prediction effect. In the motion prediction of H.264, a macroblock (MB) can be divided into different sub blocks according to Fig. 2, forming block sizes of seven different modes. This flexible and detailed multi-mode division is more suitable for the shape of the actual moving object in the image, and greatly improves the accuracy of motion estimation. In this way, 1, 2, 4, 8 or 16 motion vectors can be included in each macroblock. In H.264, the encoder is allowed to use more than one previous frame for motion estimation, which is the so-called multi frame reference technology. For example, for a reference frame just encoded by 2 or 3 frames, the encoder will select a frame that can give a better prediction for each target macroblock, and indicate which frame is used for prediction for each macroblock. 3、4 × Integer transformation of 4 blocks H. 264 is similar to the previous standards. Block based transform coding is used for the residual, but the transform is an integer operation rather than a real operation, and its process is basically similar to DCT. The advantage of this method is that the transformation and inverse transformation with the same accuracy are allowed in the encoder and decoder, which is convenient to use a simple fixed-point operation mode. In other words, there is no "inverse transformation error". The unit of transformation is 4 × 4 instead of the usual 8 × Eight dollars. Because the size of the transformation block is reduced, the division of the moving object is more accurate. In this way, not only the amount of transformation calculation is relatively small, but also the connection error at the edge of the moving object is greatly reduced. In order to make the transformation mode of small-size blocks not produce gray difference between blocks for large-area smooth areas in the image, 16 4 of the intra macroblock brightness data can be processed × DC coefficients of 4 blocks (one for each block, a total of 16) for the second 4 × 4 blocks of transformation, 4 4 of chroma data × The DC coefficients of 4 blocks (one for each block, a total of 4) are 2 × Transformation of 2 blocks. H. 264 in order to improve the ability of rate control, the amplitude of the change of quantization step size is controlled at about 12.5%, rather than at a constant increase. The normalization of the transform coefficient amplitude is processed in the inverse quantization process to reduce the computational complexity. In order to emphasize the fidelity of color, a smaller quantization step is adopted for the chromaticity coefficient. 4. Unified VLC H. There are two methods of entropy coding in 264. One is to adopt unified VLC (UVLC: Universal VLC) for all symbols to be encoded, and the other is to adopt content adaptive binary arithmetic coding (CABAC: context adaptive binary arithmetic coding). CABAC is optional. Its coding performance is slightly better than UVLC, but its computational complexity is also high. UVLC uses a codeword set with infinite length, and the design structure is very regular. Different objects can be encoded with the same CodeTable. This method can easily generate a codeword, and the decoder can easily identify the prefix of the codeword. UVLC can quickly obtain resynchronization in case of bit error. 5. Intra prediction In the previous H.26X series and mpeg-x series standards, inter prediction is adopted. In H.264, intra prediction is available when encoding intra images. For each 4 × 4 blocks (except for the special treatment of edge blocks), each pixel can be predicted by the different weighted sum of 17 closest previously encoded pixels (some weights can be 0), that is, 17 pixels in the upper left corner of the block where this pixel is located. Obviously, this intra prediction coding algorithm is not in time, but in spatial domain, which can remove the spatial redundancy between adjacent blocks and achieve more effective compression. As shown in Figure 4, 4 × In block 4, a, B,..., P are 16 pixels to be predicted, and a, B,..., P are encoded pixels. For example, the value of point m can be predicted by formula (j + 2K + L + 2) / 4 or formula (a + B + C + D + I + j + K + L) / 8, etc. According to the selected prediction reference points, there are 9 different modes of brightness, but there is only 1 mode of chroma intra prediction. 6. For IP and wireless environments H. 264 draft contains tools for error elimination, which is convenient for the robustness of compressed video transmission in error and packet loss multiple transmission environment, such as mobile channel or IP channel. In order to resist transmission errors, time synchronization in H.264 video stream can be completed by intra image refresh, and spatial synchronization is supported by slice structured coding. At the same time, in order to facilitate resynchronization after bit error, a certain resynchronization point is also provided in the video data of an image. In addition, intra macroblock refresh and multi reference macroblocks allow the encoder to consider not only the coding efficiency but also the characteristics of the transmission channel when determining the macroblock mode. In addition to using the change of quantization step to adapt to the channel code rate, in H.264, the method of data segmentation is often used to deal with the change of channel code rate. Generally speaking, the concept of data segmentation is to generate video data with different priorities in the encoder to support the quality of service and QoS in the network. For example, the syntax based data partitioning method is used to divide the data of each frame into several parts according to its importance, which allows less important information to be discarded when the buffer overflows. A similar temporal data partitioning method can also be used by using multiple reference frames in P frame and B frame. In the application of wireless communication, we can support the large bit rate change of wireless channel by changing the quantization accuracy or spatial / temporal resolution of each frame. However, in the case of multicast, it is impossible to require the encoder to respond to various bit rates. Therefore, unlike the fine granular scalability (FGS) method used in MPEG-4 (low efficiency), H.264 uses stream switched SP frames instead of hierarchical coding. 4、 Performance comparison of H.264 Tml-8 is the test mode of H.264. It is used to compare and test the video coding efficiency of H.264. The PSNR provided by the test results clearly shows that the results of H.264 have obvious advantages over the performance of MPEG-4 (ASP: Advanced simple profile) and H.263 + + (HLP: high latency profile). H. The PSNR of H.264 is significantly better than that of MPEG-4 (ASP) and H.263 + + (HLP). In the comparison test of six rates, the PSNR of H.264 is 2dB higher than that of MPEG-4 (ASP) and 3dB higher than that of H.263 (HLP). The six test rates and their related conditions are: 32 kbit / s rate, 10F / s frame rate and QCIF format; 64 kbit / s rate, 15F / s frame rate and QCIF format; 128kbit / s rate, 15F / s frame rate and CIF format; 256kbit / s rate, 15F / s frame rate and QCIF format; 512 kbit / s rate, 30F / s frame rate and CIF format; 1024 kbit / s rate, 30F / s frame rate and CIF format., Technology Zone