H264 basic principle

Foreword The H264 video compression algorithm is now undoubtedly the most widely used, most popular in video compression technology. With the launch of X264 / OpenH264 and FFMPEG, most users do not need to have more research on the details of H264, which reduces the cost of using H264. But in order to use H264, we still have to clarify the basic principles of H264. Today we will take a look at the basic principles of H264. H264 overview The H264 compression technology mainly uses the following methods to compress video data. include: Inframe prediction compression, solve the problem of airspace data redundancy. Inter Bloom prediction compression (motion estimation and compensation), solving the problem of time domain data. Integer discrete cosine transform (DCT), turn the correlation between the space to the frequency domain-independent data and then quantizes. Cabac compression. After the compressed frame is: I frame, P frame, and B frame: I Frame: Key frame, intra compression technology. P Frame: Forward reference frame, when compressed, only the frame already processed in front of it. Frame sound compression technology. B Frame: Two-way reference frames, when compressed, it is a frame, which refers to the frame behind it. Using inter-frame compression technology. In addition to the I / P / B frame, there is an image sequence GOP. GOP: Between two I frames is an image sequence, only one I frame in one image sequence. As shown below: Let's take a detailed description of the H264 compression technology. H264 compression technology The basic principles of H264 are actually very simple. We simply describe the process of H264 compressed data. The video frame acquired by the camera (30 frames per second) is sent to the buffer of the H264 encoder. The encoder must first divide the macroblock for each picture. This picture below is an example: Division macroblock The H264 default is the area of 16x16 size as a macroblock, or it can be divided into 8x8 size. After dividing a macroblock, calculate the pixel value of the macroblock. In this class, the pixel value of each macroblock in an image is calculated, and all macroblocks are processed as follows. Divided H264 uses a 6x16 size macroblock to compare flat images. However, for higher compression ratios, smaller sub-blocks can be more divided on the macroblock of 16x16. The size of the sub-block can be 8x16, 16x8, 8x8, 4x8, 8x4, 4x4 very flexible. In the first figure, most of the 16x16 macroblocks in the red frame are blue background, while the three eagle images are scheduted in the macroblock, in order to better deal with the three eagle parts, H264 A plurality of sub-blocks are divided in the macroblock of 16x16. In this way, more efficient data can be obtained in the intra compression. The following figure is a result of compressing the above macroblocks using MPEG-2 and H264 respectively. The left half is a result of the compression of the MPEG-2 sub-block, and the lower right half is a sub-block compressed result, and it can be seen that the division method of H264 is more advantageous. After the macroblock is divided, all images in the H264 encoder cache can be packet. Frame packet For video data, there are two types of data redundancy, and one is a time data redundancy, and the other is the data redundancy on the space. The data redundancy on the time is the largest. Let's first talk about the redundancy problem in video data time. Why is it the biggest redundancy? Assume that the camera captures 30 frames per second, which is associated with most of the data of the 30 frames. It is also possible to more than 30 frames, may be dozens of frames, and the data of equal frames is particularly close. For these three-related frames, we only need to save a frame of data, and other frames can be predicted by this frame, so that video data is mostly redundant in time. In order to reach the relevant frames to compress the data by prediction, the video frame needs to be packet. So how can I determine that some frame relationships are close, can you be a group? Let's take a look at the example below. Here is a video frame of a group of sports that capture, the billiard rolls from the upper right corner to the lower left corner. The H264 encoder will be in order, each time the two adjacent frames are removed, and the similarities of the two frames are calculated. As shown below: Search by macroblock scanning and macroblock can be found that the relationship between the two frames is very high. Further, this set of frames is found to be very high. Therefore, the above frames can be divided into groups. Its algorithm is: in the adjacent image screens, there is only a point within 10% of the pixels, and the difference between the brightness difference does not exceed 2%, while the difference between the chromaticity difference is only 1%, we think such The figure can be divided into a group. In such a set of frames, after encoding, we only retain the full data of the first post, and other frames are calculated by referring to the previous frame. We call the first frame for IDR / I frame, and other frames we are called P / B frames, which is called GOP. Motion estimation and compensation After the frame packet is packet in the H264 encoder, the motion vector of the object within the frame group is calculated. Take the billiard video frame above, let's take a look at how it calculates the motion vector. The H264 encoder first removes two frames of video data from the buffer header in order, and then performs macroblock scans. When an object is found in one of the pictures, you can search in the neighborhood of another picture (in the search window). If the object is found in another picture at this time, the motion vector of the object can be calculated. The following picture is the location of the sloppost moving. By phase difference in the billiards in the above figure, the direction and distance of the operation operation can be calculated. H264 is subsequently recorded in the distance and direction of each frame in each frame. After the motion vector is calculated, the same part (the green portion) is subtracted, and the compensation data is obtained. We will ultimately only need to compress the compensation data, and then restore the original map in the decoding. Data after compression compensation requires only a little bit of data. As follows: We call motion vectors and compensation as inter-frame compression technology, which is solved is data redundancy of video frames in time. In addition to inter-frame compression, data compression is also required, and intra data compression is solved is a data redundancy on space. Let's introduce the intra compression technology. Intra prediction The human eye has an identification to the image, which is very sensitive to low frequency brightness, and is less sensitive to high frequency brightness. So based on some studies, data in an image in an image can be removed. This proposes an intra prediction technique. The intra compression of H264 is very similar to JPEG. After a image is divided into a macroblock, 9 modes of each macroblock can be performed. Find a prediction mode closest to the original map. The following frame is a process of predicting each macroblock in the whole figure. The comparison of images and original images after intra prediction, as follows: Then, the raw image is subjected to the residual value of the image after the intra predicted image. Then, the previous prediction mode information is stored together so that we can restore the original picture when decoding. The effect is as follows: After the compression of the frame and the frame, the data is significantly reduced, but there is also an optimized space. Do DCT for residual data The residual data can be made integer discrete cosine transform, remove the correlation of the data, and further compress the data. As shown in the figure below, the macroblock of the left side is the original data, the right side is the macroblock of the calculated residual data. Digital the residual data macroblock is shown as shown in the following figure: The residual data macroblock is DCT conversion. After removing the associated data, we can see that the data is further compressed. After completing the DCT, it is not enough, and the CABAC is not damaged. Cabac The above intra compression is a lossless compression technology. That is to say, after the image is compressed, it cannot be completely restored. The Cabac belongs to lossless compression technology. Non-destructive compression technology, everyone is the most familiar to Hafman, a short code to high frequency words, and give low-frequency words a long code to achieve data compression. The VLC used in MPEG-2 is this algorithm. We use A-Z as an example, and A is high frequency data, and Z is low-frequency data. See how it is doing. Cabac is also short-frequency data short code to the low frequency data. It also compresses according to the context correlation, which is much efficient than VLC. The effect is as follows: Now convert A-Z to a video frame, it is a look. From the above figure, it can be seen that the non-destructive compression of CACBA is more efficient than VLC. Small knot At this point, we will finish the principle of the H264 coding. This article mainly tells the following with some content: 1. Simpse some of the basic concepts in H264. Such as I / P / B frames, GOP. 2. Explain the basic principles of H264 encoding, including: The principle of inter-frame compression technology principle of score image group intra compression technology in macroblocks. DCTCABAC compression principle. I hope that the above can help you. Thanks!