H.264（MPEG）-4AVC

The purpose of the H.264/AVC project is to create a standard that can provide good video quality at a much lower bit rate than previous standards (ie, half the bit rate of MPEG-2, H.263, or MPEG- or more). low). 4 Part 2), without increasing the complexity of the design, so that it is impractical or too expensive to implement. Another goal is to provide enough flexibility to enable the standard to be applied to various applications on various networks and systems, including low and high bit rates, low and high resolution video, broadcasting, DVD storage, RTP/IP Packet network and ITU-T multimedia telephone system. The H.264 standard can be regarded as a "standard family" composed of many different configuration files. A particular decoder decodes at least one but not necessarily all profiles. The decoder specification describes which configuration files can be decoded. H.264 is usually used for lossy compression, although it is also possible to create truly lossless coding regions in lossy coded images, or to support rare use cases where the entire coding is lossless.

H.264 was developed by the ITU-T Video Coding Expert Group (VCEG) together with the ISO/IEC JTC1 Moving Picture Experts Group (MPEG). The project partnership is called the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC standard (formally, ISO/IEC 14496-10-MPEG-4 Part 10, Advanced Video Coding) are jointly maintained so that they have the same technical content. The final drafting of the first edition of the standard was completed in May 2003, and various extensions of its functions were added to its subsequent editions. High Efficiency Video Coding (HEVC), namely H.265 and MPEG-H Part 2 are the successors of H.264/MPEG-4 AVC developed by the same organization, and the earlier standards are still commonly used.

The most famous H.264 is probably one of the video encoding standards for Blu-ray discs; all Blu-ray disc players must be able to decode H.264. It is also widely used by streaming Internet resources, such as videos from Vimeo, YouTube and iTunes Store, network software such as Adobe Flash Player and Microsoft Silverlight, and various HDTV broadcasts on the ground (ATSC, ISDB-T, DVB)- T or DVB-T2), cable (DVB-C) and satellite (DVB-S and DVB-S2).

H.264 is protected by patents owned by all parties. Licenses covering most (but not all) patents necessary for H.264 are managed by the patent pool MPEG LA. 3 Commercial use of patented H.264 technology requires payment of royalties to MPEG LA and other patent owners. MPEG LA allows free use of H.264 technology to provide end users with free streaming Internet video, and Cisco Systems pays royalties to MPEG LA on behalf of its open source H.264 encoder binary file users.

1. Naming
The H.264 name follows the ITU-T naming convention, which is a member of the H.26x series of VCEG video coding standards; the MPEG-4 AVC name is related to the naming convention in ISO/IEC MPEG, where the standard is ISO/IEC 14496 Part 10, ISO/IEC 14496 is a suite of standards called MPEG-4. The standard was jointly developed in a partnership between VCEG and MPEG, and a VCEG project called H.26L was previously carried out in ITU-T. Therefore, names such as H.264/AVC, AVC/H.264, H.264/MPEG-4AVC or MPEG-4/H.264 AVC are often used to refer to the standard to emphasize the common heritage. Sometimes, it is also called "JVT codec", refer to the Joint Video Team (JVT) organization that developed it. (This kind of partnership and multiple naming are not uncommon. For example, the video compression standard called MPEG-2 also originated from the partnership between MPEG and ITU-T, where MPEG-2 video is called by the ITU-T community H.262. 4) Some software programs (such as VLC media player) internally identify this standard as AVC1.

2. History
At the beginning of 1998, the Video Coding Expert Group (VCEG-ITU-T SG16 Q.6) issued a call for proposals for a project called H.26L, with the goal of doubling the coding efficiency (which means that the required Bitrate halved) A given level of fidelity compared to any other existing video coding standards used for various applications. VCEG is chaired by Gary Sullivan (Microsoft, formerly PictureTel, USA). The first draft design of the new standard was adopted in August 1999. In 2000, Thomas Wiegand (Heinrich Hertz Institute, Germany) became the co-chairman of VCEG.

In December 2001, VCEG and the Moving Picture Experts Group (MPEG-ISO / IEC JTC 1 / SC 29 / WG 11) formed a Joint Video Group (JVT), and its charter finalized the video coding standard. [5] The specification was formally approved in March 2003. JVT was chaired by Gary Sullivan, Thomas Wiegand and Ajay Luthra (Motorola, USA: later Arris, USA). In June 2004, the Fidelity Scope Extension (FRExt) project was finalized. From January 2005 to November 2007, JVT is working on extending H.264/AVC to scalability through an attachment (G) called Scalable Video Coding (SVC). The JVT management team was expanded by Jens-Rainer Ohm (University of Aachen, Germany). From July 2006 to November 2009, JVT launched Multi-Video Video Coding (MVC), which is an extension of H.264/AVC to free-view TV and 3D TV. This work includes the development of two new standard profiles: Multiview High Profile and Stereo High Profile.

The standardization of the first version of H.264/AVC was completed in May 2003. In the first project to extend the original standard, JVT subsequently developed the so-called Fidelity Range Extensions (FRExt). These extensions achieve higher quality video coding by supporting higher sampling bit depth accuracy and higher resolution color information, including the so-called Y'CbCr 4:2:2 (= YUV 4:2:2) and Y 'CbCr 4:4 sampling structure: 4. The Fidelity Range Extensions project also includes other functions, such as adaptive switching between 4×4 and 8×8 integer transformations, perceptually-based quantization weighting matrices specified by the encoder, efficient lossless encoding between pictures, and support for additional color spaces . The design work of Fidelity Range Extensions was completed in July 2004, and its drafting work was completed in September 2004.

The recent further expansion of the standard includes the addition of five other new profiles [which? ] Mainly used for professional applications, adding extended color gamut space support, defining additional aspect ratio indicators, defining two other types of "supplemental enhancement information" (post-filter hints and tone mapping), and discarding the previous FRExt configuration file One (high 4:4:4 profile), industry feedback [by whom? ] The instructions should be designed differently.

The next major feature added to the standard is Scalable Video Coding (SVC). It is stipulated in Annex G of H.264/AVC that SVC allows the construction of bitstreams containing sub-bitstreams that also conform to the standard, including one such bitstream called the "base layer", which can be decoded by H.264/ AVC codec that supports SVC. For temporal bitstream scalability (ie, there are sub-bitstreams with a smaller temporal sampling rate than the main bitstream), complete access units are removed from the bitstream when the sub-bitstream is derived. In this case, the high-level syntax and inter prediction reference pictures in the bitstream are constructed accordingly. On the other hand, for spatial and quality bitstream scalability (ie, there are sub-bitstreams with lower spatial resolution/quality than the main bitstream), remove NAL from the bitstream when deriving the sub-bitstream (network Abstraction layer). . In this case, inter-layer prediction (ie, predicting a higher spatial resolution/quality signal from data of a lower spatial resolution/quality signal) is generally used for efficient coding. The scalable video coding extension was completed in November 2007.

The next major feature added to the standard is Multi-View Video Coding (MVC). It is specified in Annex H of H.264/AVC that MVC enables the construction of a bitstream representing more than one view of a video scene. An important example of this feature is stereoscopic 3D video encoding. Two profiles were developed in MVC work: Multiview High Profile supports any number of views, and Stereo High Profile is specially designed for two-view stereo video. The Multiview video coding extension was completed in November 2009.

3. Application

The H.264 video format has a very wide range of applications, covering all forms of digitally compressed video from low-bit-rate Internet streaming applications to HDTV broadcasting and almost lossless encoding digital movie applications. By using H.264, compared with MPEG-2 Part 2, the bit rate can be saved by 50% or more. For example, it is reported that the quality of digital satellite TV provided by H.264 is the same as the current implementation of MPEG-2, with a bit rate of less than half. The current implementation rate of MPEG-2 is about 3.5 Mbit/s, while H.264 is only 1.5 Mbit. /s. [23] Sony claims that the 9 Mbit/s AVC recording mode is equivalent to the image quality of the HDV format, which uses about 18-25 Mbit/s.

In order to ensure H.264/AVC compatibility and trouble-free adoption, many standards organizations have modified or added to their video-related standards so that users of these standards can use H.264/AVC. Both the Blu-ray Disc format and the now discontinued HD DVD format use H.264 / AVC High Profile as one of the three mandatory video compression formats. The Digital Video Broadcasting Project (DVB) approved the use of H.264/AVC for broadcast television at the end of 2004.

The American Advanced Television System Committee (ATSC) standards body approved H.264/AVC for broadcast television in July 2008, although the standard has not yet been used for fixed ATSC broadcasts in the United States. [25] [26] It is also approved for the latest ATSC-M/H (mobile/handheld) standard, using the AVC and SVC parts of H.264.

The CCTV (closed circuit television) and video surveillance markets have incorporated this technology into many products. Many common DSLR cameras use H.264 video contained in the QuickTime MOV container as the native recording format.

4. Derived format

AVCHD is a high-definition recording format designed by Sony and Panasonic, using H.264 (compliant with H.264, while adding other application-specific functions and constraints).

AVC-Intra is an intra-frame compression format developed by Panasonic.

XAVC is a recording format designed by Sony and uses level 5.2 of H.264/MPEG-4 AVC, which is the highest level supported by this video standard. [28] [29] XAVC can support 4K resolutions (4096×2160 and 3840×2160) with speeds up to 60 frames per second (fps). [28] [29] Sony announced that XAVC-enabled cameras include two CineAlta cameras-Sony PMW-F55 and Sony PMW-F5. [30] Sony PMW-F55 can record XAVC, 4K resolution is 30 fps, speed is 300 Mbit/s, 2K resolution, 30 fps, 100 Mbit/s. [31] XAVC can record 4K resolution at 60 fps and perform 4:2:2 chroma subsampling at 600 Mbit/s.

5. Features

Block diagram of H.264

H.264 / AVC / MPEG-4 Part 10 contains many new features that enable it to compress video more efficiently than the old standard and provide greater flexibility for applications in various network environments. In particular, some of these key functions include:

1) Multi-picture inter-picture prediction includes the following features:

Use previously coded pictures as references in a more flexible way than previous standards, allowing the use of up to 16 reference frames (or 32 reference fields in the case of interlaced coding) in some cases. In profiles that support non-IDR frames, most levels specify that there should be enough buffering to allow at least 4 or 5 reference frames at the maximum resolution. This is in contrast to existing standards, which usually have a limit of 1; or, in the case of traditional "B images" (B frames), two. This special feature usually allows a modest improvement in bit rate and quality in most scenarios. [Need for citation] But in certain types of scenes, such as scenes with repetitive actions or switching scenes back and forth or uncovered background areas, it allows to significantly reduce the bit rate while maintaining clarity.

Variable block size motion compensation (VBSMC), the block size is 16×16, as small as 4×4, which can realize the precise segmentation of the moving area. The supported luma prediction block sizes include 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4, many of which can be used together in a single macro block. According to the chroma sub-sampling in use, the chroma prediction block size is correspondingly smaller.
In the case of a B macroblock composed of 16 4×4 partitions, each macroblock can use multiple motion vectors (one or two for each partition) at a maximum of 32. The motion vector of each 8×8 or larger partition area can point to a different reference image.

Any macroblock type can be used in B-frames, including I-macroblocks, resulting in more efficient coding when using B-frames. This characteristic can be seen from the MPEG-4 ASP.
Six-tap filtering used to derive half-pixel luminance sample prediction for clearer sub-pixel motion compensation. The quarter-pixel motion is derived through linear interpolation of half-color values to save processing power.

The quarter-pixel precision used for motion compensation can accurately describe the displacement of the moving area. For chroma, the resolution is usually halved in the vertical and horizontal directions (see 4:2:0), so the motion compensation of chroma uses one-eighth chroma pixel grid unit.

Weighted prediction allows the encoder to specify the use of scaling and offset when performing motion compensation, and provides significant performance advantages in special situations-such as fade in and fade out, fade in and fade in and fade in and fade out transitions. This includes implicit weighted prediction of B frames and explicit weighted prediction of P frames.

Spatial prediction for the edges of adjacent blocks for "intra" coding, instead of the "DC" prediction found in MPEG-2 Part 2 and the transform coefficient prediction in H.263v2 and MPEG-4 Part 2:
This includes luma prediction block sizes of 16×16, 8×8, and 4×4 (where only one type can be used in each macroblock).

2) Lossless macroblock coding functions include:

The lossless "PCM macroblock" represents the mode, which directly represents the video data samples, [34] allows the perfect representation of a specific area, and allows strict restrictions on the amount of coded data for each macroblock.

The enhanced lossless macroblock representation mode allows for a perfect representation of a specific area, while generally using much fewer bits than the PCM mode.
Flexible interlaced video encoding functions, including:

Macroblock adaptive frame-field (MBAFF) coding uses a macroblock pair structure for the image coded as a frame, allowing 16×16 macroblocks in field mode (compared to MPEG-2, where field mode processing is implemented in the image Encoding as a frame results in the processing of 16×8 semi-macroblocks).

Image adaptive frame and field coding (PAFF or PicAFF) allows freely selected images to be mixed and coded as a complete frame, where two fields are combined for encoding or as a single single field.
New conversion design features, including:

Exactly matching integer 4×4 spatial block transform, allowing accurate placement of residual signals, almost no "ringing" common in previous codec designs. This design is similar in concept to the well-known discrete cosine transform (DCT), which was introduced in 1974 by N. Ahmed, T. Natarajan, and K.R. Rao, and it is a reference 1 in the discrete cosine transform. However, it is simplified and provides precisely specified decoding.
Accurately matching integer 8×8 spatial block transforms, allowing more efficient compression of highly correlated regions than 4×4 transforms. The design is similar in concept to the well-known DCT, but is simplified and provided to provide precisely specified decoding.
Adaptive encoder selection between 4×4 and 8×8 transform block sizes for integer transform operations.
A secondary Hadamard transform is performed on the "DC" coefficients of the main space transform applied to the chrominance DC coefficients (and in a special case also the luminance) to obtain even more compression in the smooth region.

3) Quantitative design includes:
Logarithmic step size control, simpler bit rate management and simplified inverse quantization scaling through the encoder
The frequency-customized quantization scaling matrix selected by the encoder is used for perception-based quantization optimization
The loop deblocking filter helps prevent the block effect common to other DCT-based image compression technologies, so as to obtain a better visual appearance and compression efficiency

4) Entropy coding design includes:
Context-adaptive binary arithmetic coding (CABAC), an algorithm for lossless compression of syntax elements in a video stream that knows the probability of syntax elements in a given context. CABAC compresses data more efficiently than CAVLC, but requires more processing to decode.
Context Adaptive Variable Length Coding (CAVLC), which is a lower complexity alternative to CABAC used to encode quantized transform coefficient values. Although the complexity is lower than CABAC, CAVLC is more refined and more effective than methods commonly used to encode coefficients in other existing designs.
A common simple and highly structured variable-length coding (VLC) technique used for many syntax elements not coded by CABAC or CAVLC is called Exponential Golomb coding (or Exp-Golomb).

5) Loss recovery functions include:

The network abstraction layer (NAL) definition allows the same video syntax to be used in many network environments. A very basic design concept of H.264 is to generate self-contained data packets to remove duplicate headers, such as MPEG-4's Header Extension Code (HEC). This is achieved by decoupling information related to multiple slices from the media stream. The combination of advanced parameters is called a parameter set. [35] The H.264 specification includes two types of parameter sets: Sequence Parameter Set (SPS) and Picture Parameter Set (PPS). The effective sequence parameter set remains unchanged in the entire encoded video sequence, and the effective image parameter set remains unchanged within the encoded image. The sequence and image parameter set structure contains information such as image size, optional coding mode adopted, and macroblock-to-slice group mapping.

Flexible macroblock ordering (FMO), also known as slice group, and arbitrary slice ordering (ASO), is a technique used to reconstruct the ordering of the representation of basic regions (macroblocks) in a picture. Generally regarded as error/loss robustness functions, FMO and ASO can also be used for other purposes.
Data Partitioning (DP), a function that can divide the more important and less important syntax elements into different data packets, can apply Unequal Error Protection (UEP) and other types of error/loss robustness improvements.
Redundant slice (RS), a robustness feature for error/loss, which allows the encoder to send an additional representation of the image area (usually with lower fidelity), which can be used if the main representation is corrupted or lost .
Frame number, allowing the creation of "subsequences" function, achieving temporal scalability by optionally including additional pictures between other pictures, and detecting and hiding the loss of the entire picture, which may be caused by network packet loss or channel An error occurred.
Switching slices, called SP and SI slices, allow the encoder to instruct the decoder to jump to the ongoing video stream for purposes such as video stream bitrate switching and "trick mode" operations. When the decoder uses the SP/SI function to jump to the middle of the video stream, it can obtain an exact match with the decoded image at that position in the video stream, despite using a different picture or no picture at all, as a previous reference. switch.
A simple automatic process used to prevent accidental simulation of the start code, which is a special bit sequence in the encoded data, allows random access to the bit stream and restores byte alignment in systems where byte synchronization may be lost.
Supplemental Enhancement Information (SEI) and Video Usability Information (VUI) are additional information that can be inserted into the bitstream to enhance the video for various purposes. [Clarification needed] SEI FPA (Frame Encapsulation Arrangement) contains 3D arrangement of messages:

Auxiliary picture, which can be used for alpha synthesis and other purposes.
Supports monochrome (4:0:0), 4:2:0, 4:2:2 and 4:4:4 chroma subsampling (depending on the selected profile).
Supports sampling bit depth accuracy, ranging from 8 to 14 bits per sample (depending on the selected profile).
Able to encode each color plane into different images with its own slice structure, macroblock mode, motion vector, etc., allowing the use of a simple parallel structure to design the encoder (only three configuration files that support 4:4:4 are supported) .

Image sequence counting is used to maintain the order of the images and the characteristics of the sample values in the decoded image isolated from the timing information, allowing the system to carry and control/change the timing information separately without affecting the content of the decoded image.
These technologies and several other technologies help H.264 to perform better than any previous standard in various application environments in various situations. H.264 generally performs better than MPEG-2 video-usually the same quality at half the bit rate or lower, especially at high bit rates and high resolutions.
Like other ISO/IEC MPEG video standards, H.264/AVC has a reference software implementation that can be downloaded for free. Its main purpose is to provide examples of H.264/AVC functions, not a useful application in itself. The Motion Picture Experts Group is also doing some reference hardware design work. The above are the complete features of H.264/AVC, covering all the configuration files of H.264. The profile of a codec is a set of characteristics of the codec, which is identified to meet certain set of specifications for the intended application. This means that some configuration files do not support many of the functions listed. The various configuration files of H.264/AVC will be discussed in the next section.