"Keywords:
AVS, video
One of the most important developments of video coding technology in the past few years is the h.264/mpeg-4 AVC standard developed by the joint video team (JVT) of ITU and ISO / IEC. In the development process, the industry has taken many different names for this new standard. ITU began to use important new coding tools to deal with H.26L (long-term) in 1997. The results were encouraging. Therefore, ISO decided to cooperate with itu to establish JVT and adopt a general standard. Therefore, you sometimes hear that this standard is called JVT, although it is not an official name. ITU approved the new H.264 standard in May 2003. ISO approved the standard in October 2003 under the name of MPEG-4 Part 10, advanced video coding or AVC.
H. The improvements achieved by 264 create new market opportunities
H. 264 / AVC has made a great breakthrough in compression efficiency. Generally, it reaches about twice the compression efficiency of MPEG-2 and MPEG-4 simplified classes. In the formal test conducted by JVT, 78% of the 85 test cases of H.264 achieved more than 1.5 times the coding efficiency, more than 2 times in 77% of the cases, and even 4 times in some cases. H. The improvement of 264 has created new market opportunities. For example, 600kbps VHS quality video can realize video on demand through ADSL line; High definition movies can adapt to ordinary DVDs without a new laser head.
H. 264 standardization supports three categories: basic class, main class and extension class. Later, a revision called high fidelity range extension (FRExt) introduced four additional classes called advanced classes. In the early stage, the basic and main classes aroused everyone's interest. The basic class reduces the computing and system memory requirements, and is optimized for low latency. Due to the inherent delay of B frame and the computational complexity of CABAC, it does not include both. The basic class is very suitable for videophone applications and other applications requiring low-cost real-time coding.
The main class provides the highest compression efficiency, but its processing capacity is much higher than that of the basic class, so it is difficult to be used in low-cost real-time coding and low delay applications. Broadcast and content storage applications are most interested in the main classes in order to obtain the highest video quality at the lowest bit rate as possible.
Although H.264 adopts the same main coding functions as the old standard, it also has many new functions different from the old standard, which together improve the coding efficiency. The main differences are summarized as follows:
Intra prediction and coding: H.264 uses spatial intra prediction technology to predict the pixels in the intra MB of adjacent pixels of adjacent blocks. It encodes the prediction residual signal and prediction mode instead of the actual pixels in the coding block. This can significantly improve the intra coding efficiency.
Interframe prediction and coding: interframe coding in H.264 adopts the main functions of the old standard, but also increases flexibility and operability, including several block size options for multiple functions, such as motion compensation, quarter pixel motion compensation, multiple reference frames, generalized bidirectional prediction and adaptive loop deblocking.
Variable vector block size: allows different block sizes to perform motion compensation. A single motion vector can be transmitted for blocks as small as 4 (4), so up to 32 motion vectors can be transmitted for a single MB in the case of bidirectional prediction. In addition, block sizes of 16 (8), 8 (16), 8 (8), 8 (4) and 4 (8) are also supported. Reducing the block size can improve the processing ability of motion details, so as to improve the subjective quality feeling, including eliminating large blocking distortion.
Quarter pixel motion estimation: motion compensation can be improved by allowing half pixel and quarter pixel motion vector resolution.
Multi reference frame prediction: 16 different reference frames can be used for inter frame coding, which can improve the subjective feeling of video quality and improve coding efficiency. Providing multiple reference frames also helps to improve the fault tolerance of H.264 bitstream. It is worth noting that this feature will increase the memory requirements of encoder and decoder, because multiple reference frames must be saved in memory.
Adaptive loop deblocking filter: H.264 adopts an adaptive deblocking filter, which will process the horizontal and vertical block edges in the prediction loop to eliminate the distortion caused by block prediction error. This filtering is usually based on 4 (4) blocks of boundaries, in which 3 pixels on each side of the boundary can be updated by a 4-level filter.
Integer transformation: the early standard of DCT must define the tolerance range of rounding error for the fixed point implementation of inverse transformation. The drift caused by the mismatch of IDCT accuracy between encoder and decoder is the root of quality loss. H. 264 solves this problem by using integer 4 (4) spatial transform, which is an approximation of DCT. 4 (4) cell blocks also help reduce blocking and ringing distortion.
Quantization and transform coefficient scanning: transform coefficients are quantized by scalar quantization without increasing dead band. Similar to the previous standard, each MB can choose a different quantization step, but the step increases at a compound rate of about 12.5% instead of a fixed increment. At the same time, the finer quantization step can also be used for chromaticity components, especially in the case of coarse quantization of photometric coefficients.
Entropy coding: different from the previous standard of providing multiple static VLC tables according to the data types involved, H.264 adopts context adaptive VLC for transform coefficients, and adopts a unified VLC (Universal VLC) method for all other symbols. The main class also supports a new context adaptive binary arithmetic encoder (CABAC). CAVLC is better than the previous VLC implementation, but the cost is higher than VLC.
CABAC uses the probability model of encoder and decoder to process all syntax elements, including transform coefficients and motion vectors. In order to improve the coding efficiency of arithmetic coding, the basic probability model adapts to the changing statistics in video frames through a method called context modeling. Context modeling analysis provides conditional probability estimates of coded symbols. As long as the appropriate context model is used, it can switch between different probability models according to the encoded symbols around the symbols to be encoded, so as to make full use of the redundancy between symbols. Each syntax element can maintain a different model (for example, motion vectors and transform coefficients have different models). Compared with VLC entropy coding method (UVLC / CAVLC), CABAC can save 10% bit rate.
Weighted prediction: it uses the weighted sum of forward and backward prediction to establish the prediction of bidirectional interpolation macro module, which can improve the coding efficiency when the scene changes, especially in the case of fading.
Fidelity range extension: in July 2004, the H.264 standard added a new revision called fidelity range extension (FRExt) [11]. This extension adds a complete set of tools to H.264 and allows additional color gamut, video format and bit depth. In addition, support for lossless inter frame coding and stereoscopic display video is added. The revised version of FRExt introduces four new classes in H.264, namely:
? High profile (HP): for standard 4:2:0 chroma sampling, 8-bit color per component. This class introduces new tools - detailed later.
? High 10 profile (hi10p): Standard 4:2:0 chroma sampling, 10 bit color for higher definition video display.
? High 4:2:2 10 bit color profile (h422p): used for source editing.
? High 4:4:4 12 bit color profile (h444p): the highest quality source editing and color fidelity, support lossless coding of video areas and new integer Gamut Transformation (from RGB to YUV and black).
In new applications, H.264 HP is particularly advantageous for broadcasting and DVD. Some experiments show that the performance of H.264 HP is three times higher than that of MPEG2. The main additional tools introduced in H.264 HP are described below.
Adaptive residual block size and integer 8 (8) transform: the residual block used for transform coding can be switched between 8 (8) and 4 (4). A new 16 bit integer transform for 8 (8) blocks is introduced. Smaller blocks can still use the previous 4 (4) transform.
8 (8) brightness intra prediction: 8 modes are added. In addition to the previous 16 (16) and 4 (4) blocks, the brightness internal macro module can also perform intra prediction for 8 (8) blocks.
Quantization weighting: a new quantization weighting matrix used to quantize 8 (8) transform coefficients.
Monochrome: support black / white video coding.
AVS
In 2002, the audio and video technical standards (AVS) working group established by the Ministry of information industry of China announced that it would prepare a national standard for mobile multimedia, broadcasting, DVD and other applications. The video standard is called AVS and consists of two related parts: AVS-M for mobile video applications and avs1.0 for broadcasting and DVD. The AVS standard is similar to H.264.
Avs1.0 supports both interlaced and progressive scanning modes. In AVS, the forward reference frame of 2 frames can be used for P frame, and one frame before and after B frame is allowed. In interlaced mode, 4 fields can be used as a reference. Frame / field coding in interlaced mode can be performed only at the frame level, which is different from H.264, where MB level adaptation of this option is allowed. AVS has a loop filter similar to H.264 and can be turned off at the frame level. In addition, the B frame does not need a loop filter. Intra prediction is performed in units of 8 (8 blocks). MC allows 1 / 4 pixel compensation for brightness blocks. The block size of me can be 16 (16), 16 (8), 8 (16) or 8 (8). The transformation method is based on 16 bit 8 (8) integer transformation (similar to wmv9). VLC is context based adaptive 2D run / level coding. Four different exp Golomb codes are used. The coding for each quantized coefficient is adapted to the preceding symbols in the same 8 (8) block. The exp Golomb table is smaller because it is parameterized. The video quality of AVS 1.0 for progressive video sequences is slightly inferior to the H.264 main class at the same bit rate.
AVS-M is mainly aimed at mobile video applications, which intersects with the basic H.264 specification. It only supports progressive video, I and P frames, not B frames. The main AVS-M coding tools include 4 (4) block based intra prediction, 1 / 4 pixel motion compensation, integer transformation and quantization, context adaptive VLC and highly simplified loop filter. Similar to the H.264 basic specification, the motion vector block size in AVS-M is reduced to 4 (4), so MB can have up to 16 motion vectors. Multi frame prediction is adopted, but only 2 reference frames are supported. In addition, AVS-M also defines a subset of H.264 HRD / SEI messages. The coding frequency of AVS-M is about 0.3dB, which is slightly lower than the basic H.264 specification under the same setting, while the complexity of the decoder is reduced by about 20%.
H. 264 and AVS background
H. 264 / mpeg-4avc is a new generation video coding standard jointly developed by VCG (Video Coding Experts Group) of ITU-T and MPEG (moving picture experts group) of ISO / IEC. Applications include videophone, video conference, etc. H. The main feature of 264 is that it greatly improves the compression rate, which is more than twice the compression efficiency of MPEG-2 and MPEG-4. H. The core technology of 264 is the same as the previous standards, and still adopts the hybrid coding framework based on predictive transform, but there is a great difference in the implementation of details, that is, the improvement of details leads to the great improvement of compression efficiency. Moreover, the new generation video coding standard H.264 has the characteristics of good network adaptability and fault tolerance.
The birth of AVS can be said to be a historical opportunity. Facing the high patent fees of H.264 and MPEG-2 standards, China's digital video industry is facing serious challenges. In addition, China is committed to improving the core competitiveness of the domestic digital audio and video industry. The "digital audio and video codec technology standard working group" was approved by the science and Technology Department of the Ministry of information industry in June 2006 to unite domestic scientific research institutions and enterprises engaged in the research and development of digital audio and video codec technology to meet the needs of China's audio and video industry, This paper puts forward the source coding standard of China's independent intellectual property rights - advanced audio and video coding in information technology, which is referred to as AVS (audio video coding standard). The independent AVS standard is at the international advanced level in technology and performance. If we seize this opportunity, China will be in the industrial chain of technology patent standard chip system industry, It is possible to have a comprehensive initiative.
H. Analysis and comparison of 264 and AVS core technologies
H. 264, like the previous standards, still adopts the hybrid coding framework. AVS video standard adopts a technical framework similar to H.264, including transformation, quantization, entropy coding, intra prediction, inter prediction, loop filtering and so on. The differences in their core technologies include the following:
1、 Transform and quantization
H. 264 adopts block based transform coding for residual data to remove the spatial redundancy of the original image, so that the image ability is concentrated on a small part of the coefficients, and the DC coefficient value is generally the largest, which can improve the compression ratio and enhance the anti-interference ability. DCT transformation is generally used in the previous standards. The disadvantage of this transformation is that there will be mismatch. There will be a difference after the original data is restored by transformation and inverse transformation. Because it is a real number, the amount of calculation is also relatively large. H. 264 is based on 4 × Integer transformation of 4 blocks.
AVS adopts 8 × 8, which can be implemented without mismatch on a 16 bit processor. The decorrelation of high-resolution video images is better than 4 × 4 transform is effective, and 64 level quantization is adopted, which can meet the requirements of different applications and services for code stream and quality.
2、 Intra prediction
H. Both 264 and AVS technologies use intra prediction, use adjacent pixels to predict the current block, and use a variety of prediction modes representing spatial domain texture. H. The brightness prediction of 264 has 4 × 4 and 16 × 16 blocks, 2 prediction modes, for 4 × Block 4: there are 9 prediction directions from - 135 degrees to + 22.5 degrees plus a DC prediction; For 16 × Block 16: there are 4 prediction directions. The chromaticity prediction is 8 × 8 blocks, with 4 prediction modes, similar to intra 16 × 16. Four predicted modes, in which DC is mode 0, horizontal is mode 1, vertical is mode 2 and plane is mode 3.
3、 Inter prediction
H. 264 interframe
Our other product: