"H.264 / AVC Video Code Technology Detailed" video tutorial has been on "CSDN", which details the background, standard protocol and implementation of H.264, and through a practical engineering form to H.264 Standard analysis and implementation, welcome to watch!
"The paper is very shallow, perceived this matter", only the standard document is operated in the form of code, in order to have a sufficient understanding and understanding of the video compression coding standards!
Link address: H.264 / AVC video codec technology detailed
This section video free
I. H.264 video coding standard
The H.264 video coding standard is another huge result of ITU-T and MPEG cooperation. It has a huge impact in the industry since the date of promulgation. Strictly speaking, the H.264 standard is part of the MPEG-4 family, namely, MPEG-4 series document ISO-14496, thus also referred to as MPEG-4 / AVC. Unlike MPEG-4, the flexibility and interactivity of the MPEG-4, H.264 highlights higher encoding compression ratio and transmission reliability, has a wide range of applications in digital TV broadcasts, real-time video communications, network streaming media.
II. Introduction to H.264 Video Coding Method
In the overall encoding framework, H.264 still uses a structure similar to the previous standard, which is a mixed encoding frame of block structure. Its main structural map is shown in the following figure:
In the process of encoding H.264, each frame of H images is divided into one or more strips (SLICE). Each strip contains multiple macroblocks (MB, MaCroblock). The macroblock is a basic coding unit in the H.264 standard, and its basic structure includes a 16 x 16 brightness pixel block and two 8 × 8 chromatic pixel blocks, and some other macroblock headers. When encoding a macroblock, each macroblock is divided into a plurality of different sizes of sub-blocks for prediction. The block size used by the intra prediction may be 16 × 16 or 4 × 4, and the block prediction / motion compensation adopted may have seven different shapes: 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8 and 4 × 4. Compared to early standards, only the macroblock or half macroblock is performed, and the more sub-scientific macroblock segmentation method used by H.264 provides higher prediction accuracy and coding efficiency. In the transformation coding, the transform block size for predictive residual data is 4 × 4 or 8 × 8 (support only in FREXT version). Compared to earlier versions of only 8 × 8 size transform blocks, H.264 avoids mismatch problems that often appear in the conversion inverse transformation.
The entropy encoding method used in the H.264 standard mainly has context adaptive beads encoding CAVLC and context adaptive binary coding CABAC, specifying different encoding methods according to different syntax elements. By these two entropy encoding methods reach a balance between coding efficiency and calculation complexity.
Similar to the previous standard, the strip of H.264 also has different types, including I strips, p strips, and B strips. In addition, Si and SP films are also defined in the extended grade to support the code stream.
I strip: intra coded strips, only I macroblocks; P strips: one-way inter-frame encoding strips, may include a P macroblock and I macroblock; B strip: two-way inter-frame encoding strip, possibly Contains a B macroblock and I macroblock;
The encoding tools such as predicted coding, change quantization, entropy encoding, and entropy coding, and other encoding tools are mainly operated below, which is often referred to as ** "Video Coding Layer, VCL). Relatively, the data and algorithms performed above SLICE are often referred to as "Network Abstract Layer, NAL) **. The main significance of the design definition NAL layer is to enhance the affinity of video transmission and data storage in H.264 format.
In order to adapt to different application scenarios, H.264 also defines three different grades:
Baseline Profile: Mainly used in low-delay in video conferencing, visual phone, etc., supports I strips and p bands, entropy encoding supports CAVLC algorithm. Main profile: Mainly used in digital TV broadcasts, digital video data storage, etc .; support video field coding, B bidirectional prediction and weighted prediction, entropy encoding supports CAVLC and CABAC algorithms. Extended profile: Mainly used for network video live and on-demand, etc., support all features of the reference grade, support Si and SP strips, support data segmentation to improve the error performance, support B strips and weighted forecasts, but Cabac and field coding are not supported.
III. Coding tools adopted by H.264 standard
The encoding technology used in H.264 mainly has the following types:
Intra prediction
The intra prediction technique based on pixel blocks is used in H.264. Mainly divided into the following different types:
16 × 16-size brightness block: 4 prediction modes 4 × 4 size brightness block: 9 prediction mode color block: 4 prediction modes, the same 16 × 16 brightness block
The four prediction patterns of 16 × 16 brightness blocks and color blocks are shown below:
The 9 prediction modes of the 4 × 4 brightness block are shown below:
Inter-frame prediction
The inter prediction method in H.264 adopts block-based motion estimation and compensation method, which main features:
Multiple candidate reference frames; B frame as reference frame; arbitrary reference frame sorting; multiple motion compensation pixel block shape, including 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8 and 4 × 4 pixels; 1/4 pixel (brightness) sub-pixel interpolation; a frame or field based motion estimation for interlaced video;
Macro blocks for inter prediction, the method of sub-macroblock segmentation is shown in the figure:
The representation of sub-pixel interpolation is shown below. Among them, the red point represents the position of the entire pixel point in the image, and the green point represents the position of the 1/2 pixel interpolation between the two positive pixels, and the purple point represents the position of the 1/4 pixel interpolation.
Interlaced video encoding
The video of the interlaced scan, H.264 specifically defines algorithms for processing such interlace video.
Picaff: Picture Adaptive Frame Field - Frame Adaptation of Image Layers; MBAFF: Macroblock Adaptive Frame Field - The Frame Adaptation of the Macro Block layer;
Transform and quantification coding
The transformation coding of H.264 innovatively adopts an integer transformation of class DCT, which effectively reduces the complexity of the operation. For the base version H.264, the transform matrix is 4 × 4; in the Frext extension, 8 × 8 transformation matrix is also supported.
The quantization algorithm of H.264 still uses the scale quantization method.
Lossless entropy encoding algorithm
H.264 standard specifies a different entropy encoding algorithm for different grammatical elements, mainly:
UVLC (Universal Variable Length Coding): Mainly used index Columbus encoding; CAVLC (Context Adaptive Variable Length Coding): Context Adaptive Biometric Coding: Cabac (Context Adaptive Binary Arithmetic Coding): Adapted Binary Coding;
Other technology
In addition to the above-described core algorithm, H.264 also defines a variety of techniques including deck loop filters, Si / SP frames, code rate control.
Our other product: