Development of video coding standards summary: from H.261 to H.264

"Keywords: Encoding, video Digital video technology is widely used in communication, computer, radio and television and other fields. It has brought a series of applications such as conference television, videophone, digital television and media storage, which has promoted the emergence of many video coding standards. ITU-T and ISO / IEC are two major organizations that formulate video coding standards. ITU-T standards include H.261, H.263 and H.264, which are mainly used in the field of real-time video communication, such as conference television; MPEG series standards are formulated by ISO / IEC and are mainly used in video storage (DVD), radio and television, streaming media on the Internet or wireless network, etc. The two organizations have also jointly formulated some standards. H.262 standard is equivalent to MPEG-2 video coding standard, and the latest h.264 standard is included in part 10 of MPEG-4. According to the development of ITU-T video coding standard, this paper introduces H.261, H.263 and H.264. H. 261 video coding standard H. 261 is formulated by ITU-T to carry out two-way audio-visual services (videophone and video conference) on Integrated Services Digital Network (ISDN), and the rate is an integral multiple of 64KB / s. H. 261 only CIF and QCIF image formats are processed, and each frame image is divided into image layer, gob layer, MB layer and block layer. H. 261 is the earliest moving image compression standard. It formulates various parts of video coding in detail, including inter prediction of motion compensation, DCT transformation, quantization, entropy coding, and rate control adapted to fixed rate channels. H. 263 video coding standard H. 263 is the first ITU-T standard for low bit rate video coding. The Second Edition (H.263 +) and H.263 + + have added many options to make it more widely applicable. H. 263 video compression standard H. 263 is a video coding standard formulated by ITU-T for narrowband communication channels lower than 64KB / s. It is developed on the basis of H.261. Its standard input image format can be s-qcif, QCIF, CIF, 4cif or 16cif color 4 ∶ 2 ∶ 0 subsampled image. H. Compared with H.261, 263 adopts half pixel motion compensation and adds four effective compression coding modes. The unrestricted motion vector mode allows the motion vector to point to an area outside the image. When the reference macroblock referred to by a motion vector is outside the encoded image, it is replaced by the image pixel value of its edge. When there is cross boundary motion, this mode can achieve great coding gain, especially for small images. In addition, this mode includes the expansion of the range of motion vectors, allowing the use of larger motion vectors, which is particularly beneficial to camera motion. The syntax based arithmetic coding mode uses arithmetic coding instead of Huffman coding, which can reduce the bit rate when the signal-to-noise ratio and reconstructed image quality are the same. Advanced prediction mode allows 4 8 in a macroblock × 8 brightness blocks each correspond to a motion vector, which improves the prediction accuracy; The motion vectors of the two chrominance blocks take the average of the motion vectors of the four luminance blocks. When compensating, use overlapping block motion compensation, 8 × The compensation value of each pixel of the 8 brightness block is obtained by the weighted average of three predicted values. Using this mode can produce significant coding gain, especially the overlapping block motion compensation, which will reduce the block effect and improve the subjective quality. The Pb frame mode specifies that one Pb frame contains two frames of images encoded as one unit. Pb frame mode can double the frame rate when the code rate does not increase much. H. 263 Video Compression Standard Version 2 After the release of H.263, ITU-T revised and released version 2 of H.263 standard, which is informally named H.263 + standard. On the basis of keeping the core syntax and semantics of the original H.263 standard unchanged, it adds several options to improve the compression efficiency or improve some functions. The original H.263 standard limits the image input format it applies, and only five video source formats are allowed. H. The 263 + standard allows a wider range of image input formats and customized image sizes, thus broadening the scope of the standard, so that it can process windows based computer images, image sequences with higher frame rates and widescreen images. In order to improve the compression efficiency, H.263 + adopts advanced intra coding mode; The enhanced Pb frame mode improves the deficiency of H.263 and enhances the effect of inter prediction; The deblocking filter not only improves the compression efficiency, but also provides the subjective quality of the reconstructed image. In order to adapt to network transmission, H.263 + adds time classification, signal-to-noise ratio and space classification, which is very meaningful for transmitting video signals in noisy channels and networks with a large number of packet losses; In addition, chip structure mode and reference frame selection mode enhance the anti error ability of video transmission. H. 263 + + video compression standard H263 + + adds three options on the basis of H263 +, mainly to enhance the anti error performance of the code stream on bad channels and improve the coding efficiency. The three options are: Option U - called enhanced reference frame selection, which can provide enhanced coding efficiency and channel error regeneration (especially in the case of packet loss). It is necessary to design multi buffer to store multi reference frame images. Option V - called data slicing, it can provide enhanced anti error capability (especially when the local data is damaged during transmission), and protect the motion vector by separating the coefficient header and motion vector data of DCT in the video bitstream. Option W - add supplementary information to the code stream of H263 + to ensure enhanced reverse compatibility. Additional information includes: fixed-point IDCT, image information and information type, arbitrary binary data, text, repeated image header, alternating field indication and sparse reference frame recognition. H. 264 video coding standard H. 264 is a new generation video compression coding standard formulated by the joint video group (JVT) composed of ISO / IEC and ITU-T. In fact, the development of H.264 standard can be traced back to 8 years ago. After formulating the H.263 standard in 1996, the video coding expert group (VCEG) of ITU-T began research in two aspects: one is the short-term research plan, adding options on the basis of H.263 (then H.263 + and H.263 + +); The other is a long-term research plan to develop a new standard to support low bit rate video communication. The long-term research plan has produced the H.26L standard draft, which has obvious advantages over the previous ITU-T video compression standard in compression efficiency. In 2001, ISO's MPEG organization recognized the potential advantages of H.26L, and then ISO and ITU began to establish a joint video group (JVT) from ISO / IEC MPEG and ITU-T VCEG. The main task of JVT is to develop H.26L draft into an international standard. Therefore, in ISO / IEC, the standard is named AVC (advanced video coding) as the 10th option of MPEG-4 standard; It is officially named as H.264 standard in ITU-T. H. The main advantages of 264 are as follows: Under the same reconstructed image quality, H.264 reduces the bit rate by 50% compared with H.263 + and MPEG-4 (SP). It has strong adaptability to channel delay, and can work in low delay mode to meet real-time services, such as conference and television; It can also work in occasions without time delay limit, such as video storage. Improve network adaptability, adopt "network friendly" structure and syntax, strengthen the processing of bit error and packet loss, and improve the error recovery ability of the decoder. The complexity scalable design is adopted in the codec / decoder, which can be graded between image quality and coding processing to adapt to applications with different complexity. Compared with the previous video compression standard, H.264 introduces many advanced technologies, including 4 × 4 integer transform, intra prediction in spatial domain, motion estimation with 1 / 4 pixel accuracy, inter prediction technology of multiple reference frames and blocks of various sizes, etc. The new technology brings high compression ratio and greatly improves the complexity of the algorithm. four × 4 integer transformation Previous standards, such as H.263 or MPEG-4, used 8x8 DCT transform. H. The integer transformation suggested in 26L is actually close to 4 × 4. The introduction of integer reduces the complexity of the algorithm and avoids the mismatch problem of inverse transformation. 4 × 4 blocks can reduce the small block effect. And 4 of H.264 × 4 integer transformation further reduces the complexity of the algorithm. Compared with the integer transformation proposed in H.26L, for 9b input residual data, it is reduced from 32B to 16b. Moreover, the whole transformation has no multiplication and only needs addition and some shift operations. The new transform has little effect on the coding performance, and the actual coding is slightly better. Intra prediction technology based on spatial domain Video coding achieves the purpose of compression by removing the spatial and temporal correlation of images. Spatial correlation is removed by effective transformation, such as DCT transformation and integer transformation of H.264; The temporal correlation is removed by inter prediction. The transformation mentioned here removes spatial correlation and is limited to the transformed block, such as 8 × 8 or 4 × 4. There is no processing between blocks. H. 263 + and MPEG-4 introduce intra prediction technology to predict some coefficients of the current block according to adjacent blocks in the transform domain. H. 264 uses the adjacent pixels of the current block to predict each coefficient directly in the spatial domain, which can more effectively remove the correlation between adjacent blocks and greatly improve the efficiency of intra coding. H. The intra prediction of 264 basic part includes 9 kinds of 4 × 4 brightness block prediction, 4 kinds 16 × Prediction of 16 luminance blocks and prediction of 4 chrominance blocks. Motion estimation H. 264 motion estimation has three new characteristics: motion estimation with 1 / 4 pixel accuracy; 7 blocks of different sizes are matched; Forward and backward multi reference frames. H. 264 in interframe coding, a macroblock (16) × 16) Can be divided into 16 × 8、8 × 16、8 × 8 blocks, while 8 × 8 blocks are called sub macroblocks, which can be divided into 8 × 4、4 × 8、4 × 4 blocks. Overall, there are 7 blocks of different sizes for motion estimation to find the best matching type. Different from the previous standard P frame and B frame, H.264 adopts the prediction of forward and backward multiple reference frames. Half pixel motion estimation can effectively improve the compression ratio than whole pixel motion estimation, and 1 / 4 pixel motion estimation can bring better compression effect. The encoder uses a variety of blocks of different sizes for motion estimation, which can save more than 15% bit rate (compared with 16%) × 16). Using motion estimation with 1 / 4 pixel accuracy can save 20% bit rate (compared with whole pixel prediction). In terms of multi reference frame prediction, assuming that five reference frames are predicted, the bit rate can be reduced by 5% ~ 10% compared with one reference frame. The above percentages are statistical data. Different videos vary due to their detailed characteristics and motion. Entropy coding H. 264 standard adopts two kinds of entropy coding: one is the combination of Content-based Adaptive Variable Length Coding (CAVLC) and unified variable length coding (UVLC); The other is content-based adaptive binary arithmetic coding (CABAC). CAVLC and CABAC encode the current block according to the situation of adjacent blocks, so as to achieve better coding efficiency. CABAC has higher compression efficiency than CAVLC, but it is more complex. Deblocking filter H. 264 standard introduces deblocking filter to filter the boundary of the block. The filtering strength is related to the coding mode, motion vector and coefficient of the block. The deblocking filter improves the subjective effect of the image while improving the compression efficiency. Other video coding standards In addition to the above ITU-T video compression standards, there are also some popular standards, such as MPEG-4, AVS and wm9. H. 264 is also known as MPEG-4 AVC. At present, MPEG-4 in the industry generally refers to sp (simple level) or ASP (Advanced simple level). It is mainly aimed at low bit rate applications, such as streaming media on the Internet, video transmission and video storage in wireless networks. Its core is similar to H.263. MPEG-4 SP and H.263 have many similarities, as shown in the attached table. However, there are also significant differences between the two standards, mainly in: code stream structure and header information, partial code table of entropy coding, and some details of coding technology. Compared with SP, MPEG-4 ASP adds some technologies, mainly including motion estimation with 1 / 4 pixel accuracy, B frame and global motion vector (Gmv), so the compression efficiency can be improved. AVS is an audio / video coding technology standard independently formulated by China, mainly for high-definition TV, high-density optical storage media and other applications. AVS standard is based on the most advanced MPEG-4 AVC / H.264 framework in the world, emphasizes independent intellectual property rights, and fully considers the complexity of implementation. Compared with H.264, the main characteristics of AVS are: (1) 8 × 8 integer transformation and 64 level quantization（ 2) Both luminance and chroma intra prediction are based on 8 × 8 blocks as a unit, 5 prediction modes are adopted for brightness block and 4 prediction modes are adopted for chroma block（ 3) Adopt 16 × 16、16 × 8、8 × 16 and 8 × 8 4 block modes for motion compensation（ 4) In the aspect of 1 / 4 pixel motion estimation, different four tap filters are used for half pixel interpolation and 1 / 4 pixel interpolation（ 5) P frame can use up to 2 forward reference frames, while B frame adopts one reference frame before and after. Window Meida 9 (wm9) is a new generation of digital media technology developed by Microsoft. Some tests show that the video compression efficiency of wm9 is much higher than that of MPEG-2, MPEG-4 SP and H.263, but equivalent to that of H.264. Conclusion At present, H.261 and H.263 are widely used in video communication, and there are many mature products. H. Compared with H.261, 263 adds several options, provides a more flexible coding mode, greatly improves the compression efficiency and is more suitable for network transmission. H. The introduction of 264 standard is an important progress of video coding standard. Compared with the existing MPEG-2, MPEG-4 SP and H.263, it has obvious advantages, especially the improvement of coding efficiency, so that it can be used in many new fields. Although the algorithm complexity of H.264 is more than four times that of the existing coding and compression standards, with the rapid development of integrated circuit technology, the application of H.264 will become a reality“