"H.264 basic overview
With the rise of HDTV, the specification of H.264 frequently appeared in front of us, HD-DVD and Blu-ray DVD planned to produce this standard. And since the second half of 2005, both NVIDIA or ATI put support H.264 hardware decoding acceleration as its most prominent video technology. Where is H.264? "Sacred"? H.264, is also the tenth part of the MPEG-4, which is proposed by the ITU-T video coding expert group (VCEG) and the ISO / IEC Dynamic Image Expert Group (MPEG) combined with video group (JVT, Joint Video TEAM) Highly compressed digital video codec standard. H.264 is a high performance video codec technology. At present, there are two organizations that formulate video codec technology in the world, one is "ITU-T)", which formulates H.261, H.263, H.263 +, etc., the other is " International Standardization Organization (ISO) "It developed standards with MPEG-1, MPEG-2, MPEG-4, etc. The H.264 is a new digital video coding standard commonly developed by the two organizations combined with the joint video group (JVT), which is both the H.264 of ITU-T, and is ISO / IEC MPEG-4 Advanced Video. Advanced Video Coding, AVC, and it will become a section 10 of the MPEG-4 standard. Therefore, whether it is MPEG-4 AVC, MPEG-4 Part 10, or ISO / IEC 14496-10, is H.264. The maximum advantage of H.264 is to have a high data compression ratio. Under the conditions of the same image quality, the compression ratio of H.264 is 2 times the MPEG-2, which is 1.5 to 2 times the MPEG-4. For example, if the size of the original file is 88GB, it is 3.5GB after compressing the MPEG-2 compression standard, and the compression ratio is 25: 1, and the compression ratio is compressed by the compression standard to 879MB, from 88GB to 879MB, H.264 compression is more amazing 102: 1! Why is H.264 so high compression ratio? The low bit rate has an important role that compared to compression technologies such as MPEG-2 and MPEG-4 ASPs, the H.264 compression technology will greatly save users' download time and data traffic charges. It is particularly worth mentioning that H.264 has high quality images while having high compression ratio.
Advantages of H.264 algorithm
H.264 is established on the basis of MPEG-4 technology, its codec process mainly includes 5 parts: inter-frame and intra prediction, transform and reverse transformation, quantization, and Inverse quantization, loop filter, entropy coding. H.264 / MPEG-4 AVC (H.264) is the latest, most promising video compression standard after the MPEG-2 video compression standard in 1995. H.264 is the latest international video coding standard developed by the Joint Development Group of ITU-T and ISO / IEC. Through this standard, the compression efficiency under the same image quality is more than 2 times higher than the previous standard, and therefore, H.264 is generally considered to be the most influential industry standard.
H.264 development history
H.264 is called H.26L during the 1997 Video Coding Experts Group, and is known as MPEG4 Part10 (MPEG4 AVC) or H.264 (JVT) after ITU and ISO collaborative research. .
H.264 advanced technology background
The main objective of the H.264 standard is to provide more excellent image quality over the same bandwidth compared to other existing video coding standards. The maximum advantage of H.264 is compared to previous international standards such as H.263 and MPEG-4, and the biggest advantage is reflected in the following four aspects: 1. Each video frame is separated into a block consisting of pixels, so the process of encoding the video frame can reach the level of block. 2. Some original blocks of video frames are spatially predictive, conversion, optimization, and entropy encoding (variable long coding) for some original blocks of video frames. 3. The method of temporary storage is employed for different blocks of the continuous frame, so that only the part thereof in the continuous frame is encoded. This algorithm is accomplished by exercise prediction and motion compensation. For some specific blocks, searching for the search to determine the motion vector of the block in one or more frames, and thus the main block is predicted in the rear coding and decoding. 4. The residual block in the video frame is encoded by the remaining spatial redundancy technology. For example, for the source block and the corresponding prediction block, the conversion, optimization, and entropy encoding are again used.
H.264 characteristics and advanced advantages
H.264 is a new generation of digital video compression formats after the International Standardization Organization (ISO) and International Telecommunications Union (ITU), which retains the advantages and essence of compression technology in the past, and other compression techniques cannot match. Many advantages. 1. Low bit flow (Low bit): Compared to compression techniques such as MPEG2 and MPEG4 ASP, the amount of data after the compression of H.264 is only 1/8 of MPEG2, 1/3 of MPEG2, and 1/3 of MPEG2. Obviously, the use of H.264 compression technology will greatly save users' download time and data traffic charges. 2. High quality image: H.264 provides continuous, smooth high quality images (DVD quality). 3. The fault tolerance is strong: H.264 provides the necessary tools that address errors such as packet loss that occurs in unstable network environments. 4. Network adaptability: H.264 provides Network Abstract Layer, enabling H.264 files to easily transmit (eg, Internet, CDMA, GPRS, WCDMA, CDMA2000, etc.) on different networks (such as Internet, CDMA, GPRS, WCDMA, CDMA2000, etc.).
H.264 standard overview
The H.264 and previous standards are also mixed encoding modes for DPCM. However, it uses the "return to basic" simple design, no numerous options, more compressed performance than H.263 ++; strengthen the adaptability of various channels, using "network friendly" structure and grammar, It is conducive to the processing of errors and packet loss; the application target range is wider to meet the needs of different rates, different resolutions, and different transmission (storage) applications. Technically, it concentrates on the advantages of past standards and absorbs the experience of accumulation in standard development. H.264 can save up to 50 in most code rates when using the best encoder similar to the above-described encoding method than H.263 V2 (H.263 +) or MPEG-4 simple class (Simple Profile) % Code rate. H.264 can continue to provide higher video quality at all yaw rates. H.264 can work in low delay mode to accommodate real-time communication applications (such as video conferencing), while working well without delay restrictions, such as video storage and server-based video stream applications . The H.264 provides tools for processing packets in the package transfer network, as well as tools for handling bit error in the transcriptable wireless network. On the system level, H.264 proposes a new concept, conceptual segmentation between video coding layer, VCL and network extraction layer (NAL), the former is the core of video content The representation of the compressed content is a representation of the presentation by a particular type of network, such a structure facilitates the package and better priority control of information. The system encoding block diagram of H.264 is shown in Figure 1.
Main features of H.264 standard
The H264 standard is a new generation of digital video coding standards proposed by the JVT (Joint Video Team, Video Joint Working Group). JVT was established in Pattaya, Thailand in December 2001. It consists of two international standardized organizations of ITU-T VCEG (video coding expert group) and ISO / IEC MPEG (Active Image Coding Expert Group). JVT's work goal is to develop a new video coding standard to achieve a high compression ratio, high image quality, good network adaptability, and other target H264 standards. The H264 standard will be approved as a new part of the MPEG-4 standard, which is a new digital video compression coding standard for future IP and wireless environments. The main features of the H264 standard are as follows: 1. Higher coding efficiency: A average amount of greater than 50% can be saved more than 50% compared to the standard efficiency of H.263 and other standards. 2. High quality video screen: H.264 can provide high quality video images in low yaw rates, providing high quality image transmission on lower bandwidth is a highlight of H.264. 3. Improve network adaptability: H.264 can operate in real-time communication applications (such as video conferencing) low delay mode, or in a video storage or video stream server without delay. 4. The mixed coding structure is used: the same H.263 is the same as H.264, also uses a mixed coding structure of differential encoding using DCT transform encoding DPCM, but also increases, such as multi-mode motion estimation, intra prediction, multi-frame prediction, content based A new coding method such as a long coding, 4x4 two-dimensional integer transformation, improves coding efficiency. 5. The H.264 is less code option: When encoding in H.263, a considerable number of options often need to be set, and the encoding is added, and the H.264 has achieved a simple "regression basics" and reduces the complexity of encoding. 6. H.264 can be applied to different occasions: H.264 can use different transmission and playback rates depending on different environments, and provide a wealth of error handling tools, which can control or eliminate packet loss and false. 7. Error Recovery Features: H.264 provides tools for solving the problem of network transfer packages, which are suitable for transmitting video data in wireless networks transmitted in high-error rate. 8. Higher complexity: 264 Performance improvement is obtained at the expense of increasing complexity. It is estimated that the computational complexity of H.264 encoding is approximately three times the H.263, and the decoding complexity is approximately twice the H.263. The H264 standard has an Access Unit Delimiter, SEI (additional enhancement information), Primary Coded Picture, Redundant Coded Picture. There is also instantaneous decoding refresh (idR, instant decoding refresh), Hypothetical Reference Decoder (HRD, false code stream scheduler), Hypothetical Stream Scheduler (HSS, imaginary reference decoding). [4].
H.264 key technology
1. Inframe prediction encoding intra encoding is used to reduce the space redundancy of the image. In order to increase the efficiency of the H.264 intra coding, the spatial relevance of the adjacent macroblock is taken into a given frame, and the adjacent macroblocks typically contain similar properties. Therefore, when encoding a given macroblock, first, according to the surrounding macroblock prediction (typically based on the macroblock in the upper left corner, because the macroblock has been encoded), then the predicted value and the actual value The value is encoded so that the code rate can be greatly reduced relative to the frame encoding directly. H.264 provides 6 modes for 4 × 4 pixel macroblock prediction, including 1 DC prediction and 5 direction predictions, as shown in FIG. In the figure, the A to I of the adjacent block has been encoded, which can be used to predict if we select mode 4, then, A, B, C, D4 pixels are predicted to equal equivalents with E. Values, E, F, G, and H4 pixels are predicted to equal value to F, and the flat regions containing very little spatial information in the image, and H.264 also supports 16 × 16 intra-intra-code encoding. Figure 2 Inframe encoding mode 2. Inter-frame predictive encoding inter prediction encoding utilizes time redundancy in the continuous frame to perform motion estimation and compensation. H.264 Motion Compensation Supports most of the key features in previous video coding standards, and flexibly adds more features, in addition to supporting P frame, B frame, H.264 also supports a new flow of flow Frame --SP frame, as shown in Figure 3. After the SP frame is included in the code stream, it can quickly switch between the similar content but there is a stream of different yaw rates, while supporting random access and fast playback mode. Figure 3 SP-Frame The motion estimate of H.264 has the following four characteristics. (1) Macroblock division of different sizes and shapes The motion compensation of each 16 × 16 pixel macroblock can use different sizes and shapes, and H.264 supports seven modes, as shown in FIG. The motion compensation of the small block mode improves performance, reducing the block effect, and improves the quality of the image. Figure 4 Macroblock Segmentation Method (2) High-precision sub-pixel motion compensation is used in H.263, which uses half a pixel precision, and can take 1/4 or 1/8 pixel intensity in H.264. Valuation. In the case where the same accuracy is required, H.264 uses a 1/4 or 1/8 pixel precision, the residual is smaller than the residual of the H.263 using a half pixel precision motion. Thus, in the same accuracy, H.264 is smaller than the code rate required in the frame encoding. (3) Multi-frame prediction H.264 provides an optional multi-frame prediction function, and 5 different reference frames can be selected at inter-frame encoding, providing better error correction performance, so that video image quality can be improved. This feature is primarily applied to the following occasions: cyclical motion, translation, transform the camera in two different scenes. (4) The deblock filter H.264 defines a filter that adaptively remove block effects, which can handle the horizontal and vertical block edges in the prediction loop, greatly reduces the square effect. 3. Integer transformation In the conversion, H.264 uses a transformation based on 4 × 4 pixel blocks, but is used in an integer-based spatial transform, there is no anti-transform, because there is a problem with error The transform matrix is shown in Figure 5. Compared to floating point operations, integer DCT transformations cause some extra error, but becauseQuantization is also quantified in the DCT transformation, and the quantization error caused by the integer DCT transform is not large. In addition, the integer DCT transformation also has the advantage of reducing the amount of computational calculation and complexity, which is conducive to the design of the DSP transplantation. 4. Quantified H.264 can be selected 32 different quantization steps, which is similar to 31 quantization steps in H.263, but in H.264, the step size is in progress with 12.5% composite rate. , Not a fixed constant. In H.264, there are two ways to read the transformation coefficient: zigzag scan and double scan, as shown in Figure 6. In most cases, a simple zigzag scan is used; double scan is only used in blocks of smaller quantization levels to help improve coding efficiency. The last step of the transform coefficient of the transformation coefficient 5. The last step of entropy encoding video encoding processing is entropy encoding, two different entropy encoding methods in H.264: universal variable length coding (UVLC) and text-based self Adapt to binary arithmetic coding (CABAC). In H.263 and other standards, different VLC code tables are used depending on the data types, motion vectors such as transform factors, motion vectors, etc.. The UVLC code table in H.264 provides a simple method, regardless of what type of data, which is specified, all use unified variant length coding tables. Its advantage is simple; the disadvantage is that a single code table is derived from the probability statistical distribution model. It does not consider the correlation between encoded symbols, and the effect is not very good when the medium is high. Therefore, an optional CABAC method is also provided in H.264. Arithmetic coding enables the code and decoding of the probability model of all syntax elements (transform factors, motion vectors). In order to improve the efficiency of the arithmetic coding, the basic probability model can be adapted to the statistical characteristics change with the video frame by the process of modeling. Content modeling provides the conditional probability estimation of encoding symbols, using the appropriate content model, the correlation between the symbols can be removed by selecting the corresponding probability model of the encoded symbols currently to encode symbols, and different syntax elements are usually maintained. Different models. Fourth, H.264 Application in the video conference Currently, most video conferencing systems use H.261 or H.263 video coding standard, while H.264 appears, enabling H.264 by H.264 H.263 reduces a rate of 50%. In other words, even with 384kbit / s bandwidth, users can enjoy high quality video services up to 768kbit / s under H.263. H.264 not only helps save huge expenses, but also improve resource efficiency, while making video conferencing services to commercial quality have more potential customers. At present, there have been a few vendors' video conferencing products to support the H.264 protocol, and the manufacturers are committed to popularizing the new industry standards in H.264. With other video conferencing programs, we will experience the advantages of H.264 video services.
H.264 technology highlights
1. The algorithm of the hierarchical design H.264 can be conceptually divided into two layers: video coding layer (VCL: Video Coding Layer) is responsible for efficient video content, network extraction layer (NAL: NETWORK ABSTRAction Layer) is responsible for network The data is packaged and transmitted in the appropriate way. A part of the packet-based interface, packaging, and corresponding signaling belongs to NAL are defined between VCL and NAL. In this way, the tasks of high coding efficiency and network-friendlyness are completed by VCL and NAL, respectively. The VCL layer includes block-based motion compensation mix coding and some new features. Like the previous video coding standard, H.264 does not include pre-processing and post-processing functions in the draft, which increases the standard flexibility. NAL is responsible for encapsulating data using the segmentation format of the lower layer network, including the frame, the signaling of the logical channel, the utilization of timing information, or the sequence end signal, and the like. For example, NAL supports video in the transmission format on the circuit switched channel, supports video on the Internet using RTP / UDP / IP transmission format. NAL includes its head information, segment structure information, and actual payload information, ie the upper layer VCL data. (If data segmentation technology is used, the data may consist of several parts). 2, high precision, multi-mode motion estimation H.264 supports 1/4 or 1/8 pixel precidity motion vector. 6 tap filters can be used in 1/4 pixel accuracy to reduce high-frequency noise, and for a 1/8 pixel concentration motion vector, a more complex 8 tap can be used. When performing motion estimation, the encoder can also select "Enhance" interpolation filter to increase the effect of the prediction. In the motion prediction of H.264, a macroblock (MB) can be divided into different sub-blocks as shown in Figure 2 to form a block size of 7 different modes. This multi-mode flexible and meticulous division, more in line with the shape of the actual moving object, greatly improves the accuracy of the motion estimation. In this manner, 1, 2, 4, 8 or 16 motion vectors can be included in each macroblock. In H.264, the encoder is allowed to use more frames of previous frames for motion estimation, which is a so-called multi-frame reference technology. For example, 2 frames or 3 frames just encoded a good reference frame, the encoder selects a better predictive frame for each target macroblock and is used for each macroblock indicating which frame is used for prediction. 3, 4 × 4 of the integer transformation H.264 is similar to the previous standard, and the residual transformation encoding is used, but the transformation is an integer operation rather than a real number of operations, the process is basically similar. The advantage of this method is that transformations and reflex transforms that allow accuracy in the encoder and the decoder to facilitate the use of simple fixed-point operations. That is, there is no "" reverse transform error ". The unit of transform is 4 × 4 blocks, not the conventional 8 × 8 blocks. Since the size of the conversion block is reduced, the division of the moving object is more accurate, so that not only the amount of transform is relatively small, but also the connection error at the edge of the moving object is also reduced. In order to make the small-size block change to the larger area of the smoothing area in the image without generating gray differences between blocks, 16 4 x 4 blocks of the intra macroblock brightness data (each small block) One, a total of 16) performs a second time 4 × 4-piece transformation, and the DC coefficients of 4 4 × 4 pieces of chroma data (one, a total of 4 each of which) are converted for 2 × 2 blocks. H.264 In order to improve the ability of the rate control, the magnitude of the quantitative steps is controlled around 12.5%, rather than the unchanging increase. The normalization of the amplitude of the transform coefficient is placed in the reverse quantization process to reduce the complexity of the calculation. In order to emphasize the reality of the color, the chromaticity coefficient is used for a smaller quantization step. 4. There are two ways to entropy encoding in unified VLC H.264, one is a unified VLC (UVLC: Universal VLC) for all symbols to be coded, and the other is the use of content adaptive binary arithmetic coding ( Cabac: Context-Adaptive Binary Arithmetic Coding. Cabac is optional, its encoding performance is slightly better than UVLC, but the computational complexity is also high. UVLC uses a length of unlimited codeword set, the design structure is very rule, and different objects can be encoded with the same code table. This method is easy to produce a codeword, and the decoder is also easily identified by the prefix of the codeword, and the UVLC can quickly get ridage quickly when a bit is incorrect. 5, intra prediction in the previous H.26X series and MPEG-X series standards, is a way of using inter-frame prediction. In H.264, intra prediction can be used when encoding intra images. For each 4 × 4 block (except for the edge block specifically disposed), each pixel can be predicted with different weighting and (some weights of 0), which are all the previously encoded pixels. 17 pixels in the upper left corner of the block. Obviously, this intra prediction is not in time, but a predicted encoding algorithm performed on a spatial domain, and a spatial redundancy between adjacent blocks can be removed, and more efficient compression is obtained. As shown in FIG. 4, a, b, ..., p is 16 pixel points to be predicted in 4 × 4 square block, and a, b, ..., p is the encoded pixel. If the value of M point can be predicted from (J + 2K + L + 2) / 4 style, it can also be predicted by (A + B + C + D + I + J + K + L) / 8 style, and so on. According to the point of the predicted reference selected, the brightness has a total of 9 different modes, but the intra prediction of chromaticity is only 1 type mode. 6. For IP and Wireless Environment H.264, a tool for error-elimination is included in the compressed video to transmit, such as mobile channels or IP channels in a mobile channel or IP channel. In order to defend the transmission error, the time synchronization in the H.264 video stream can be completed by using an intra image refresh, and the space synchronization is supported by the slum structured code. At the same time, in order to facilitate the error of the error, a certain focal point is also provided in video data of an image. In addition, the intra macroblock refresh and multi-reference macroblock allows the encoder to not only coding efficiency while determining the macroblock mode, but also consider the characteristics of the transport channel. In addition to the modification of the quantitative steps to accommodate the channel code rate, in H.264, the method of data segmentation is often used to deal with the change of the channel code rate. Overall, the concept of data segmentation is to generate video data with different priority to support service quality QoS in the encoder. For example, the syntax-based data partitioning method is used, which is divided into several parts, which allows discard less important information when the buffer overflows. Simpore Data Partitioning methods can also be used to accomplish by using multiple reference frames in the P frame and B frames. In the application of wireless communication, we can support the large bit rate change of the wireless channel by changing the quantization accuracy or spatial / time resolution of each frame. However, in the case of multicast, it is impossible to respond to various bit rates of the encoder. Therefore, different from the method of fine classification FGS (Fine Granular Scalability) used in MPEG-4 (low efficiency is low), H.264 uses stream switching SP frames to replace the hierarchical encoding.
H264 encoding technology
The H.264 target application covers most of the current video services, such as cable TV remote monitoring, interactive media, digital TV, video conferencing, video on demand, streaming services, etc. H.264 To address the difference in network transmission in different applications. Two layers are defined: Video Coding Layer (VCL: Video Coding Layer) is responsible for efficient video content, the network extraction layer (NAL: NETWORK ABSTRAction Layer is responsible for packaging and transmitting data in the appropriate way of the network (as shown) The standard overall framework is shown. Basical Profile: This level uses H.264 in addition to B-SLICES, CABAC, and interleaving coding mode all characteristics. This level is mainly used in low-delay real-time applications. Main Profile: Contains all features of the Baseline Profile and includes B-SLICES, CABAC, and interleaving mode. It is mainly for the time of delay, when compression ratio and quality requirements are high. Profile X: Supports all Baseline Profile features, but does not support Cabac and macroblock-based adaptive frame field coding. This level is primarily targeted in various network video streaming applications.
H264 hierarchy
The H264 standard is a new generation of digital video coding standards proposed by the JVT (Joint Video Team, Video Joint Working Group). JVT was established in Pattaya, Thailand in December 2001. It consists of two international standardized organizations of ITU-T VCEG (video coding expert group) and ISO / IEC MPEG (Active Image Coding Expert Group). JVT's work goal is to develop a new video coding standard to achieve a high compression ratio, high image quality, good network adaptability, and other target H264 standards. The H264 standard will be approved as a new part of the MPEG-4 standard, which is a new digital video compression coding standard for future IP and wireless environments. The main features of the H264 standard are as follows: 1. Higher coding efficiency: A average amount of greater than 50% can be saved more than 50% compared to the standard efficiency of H.263 and other standards. 2. High quality video screen: H.264 can provide high quality video images in low yaw rates, providing high quality image transmission on lower bandwidth is a highlight of H.264. 3. Improve network adaptability: H.264 can operate in real-time communication applications (such as video conferencing) low delay mode, or in a video storage or video stream server without delay. 4. The mixed coding structure is used: the same H.263 is the same as H.264, also uses a mixed coding structure of differential encoding using DCT transform encoding DPCM, but also increases, such as multi-mode motion estimation, intra prediction, multi-frame prediction, content based A new coding method such as a long coding, 4x4 two-dimensional integer transformation, improves coding efficiency. 5. The H.264 is less code option: When encoding in H.263, a considerable number of options often need to be set, and the encoding is added, and the H.264 has achieved a simple "regression basics" and reduces the complexity of encoding. 6. H.264 can be applied to different occasions: H.264 can use different transmission and playback rates depending on different environments, and provide a wealth of error handling tools, which can control or eliminate packet loss and false. 7. Error Recovery Features: H.264 provides tools for solving the problem of network transfer packages, which are suitable for transmitting video data in wireless networks transmitted in high-error rate. 8. Higher complexity: 264 Performance improvement is obtained at the expense of increasing complexity. It is estimated that the computational complexity of H.264 encoding is approximately three times the H.263, and the decoding complexity is approximately twice the H.263. H264 standards have the main part of Access Unit Delimiter, SEI (additional enhancement information), Primary Code
Our other product: