[H.264 / AVC video codec technology] 23, inter-frame predictive code (1): Basic principle of inter predicting coding

"H.264 / AVC Video Code Technology Detailed" video tutorial has been on "CSDN", which details the background, standard protocol and implementation of H.264, and through a practical engineering form to H.264 Standard analysis and implementation, welcome to watch! "The paper is very shallow, perceived this matter", only the standard document is operated in the form of code, in order to have a sufficient understanding and understanding of the video compression coding standards! Link address: H.264 / AVC video codec technology detailed GitHub code address: Click here First, the time redundancy of the video In the third article of this series of blog posts, we initially understand the basic structure and framework of H.264 video coding technology: [H.264 / AVC video codec technology detailed] three. Introduction to H.264 In H.264, an important part of the encoding and transformation / quantization coding, entropy encoding, and has a significant impact on the performance of the codec. The predictive coding mainly includes two parts: intra prediction and inter prediction. In the previous blog post we discussed the basic principles and implementation methods of intra predictive coding: [H.264 / AVC video encoding technology detailed] Sixteen: Basic principle of intra predicting coding [H.264 / AVC video codec technology detailed] Sevente: predictive implementation method for intra predictive coding The most important feature of the intra-encoded image is that the reference image can be independently decoded, so it can be used as a start point and a random access point of a GOP, i.e., an IDR frame; however, on the other hand, according to the code rate encoded within the intra Relatively high, that is, the compression ratio is low. The reason is that the intra encoding in order to ensure independent decoding of this most critical feature, only the space redundancy of the image is compressed, and the association between the front and rear frames before and after video information cannot be taken. Unlike the intra coding, the inter-frame encoding is used by the time redundancy of the video. Typically in video information, there is a motion relationship between the object objects included in each frame, and the motion relationship of the object constitutes time redundancy between the frame and the frame. Since the motion correlation between the frame and the frame is greater than the correlation between the internal adjacent pixels of one frame, the time redundancy is more pronounced between the time, more time redundancy. The motion relationship between the object between images can be represented by the following figure: Second, the motion estimation and motion compensation of block structure In H.264, the compression time redundancy is not a method of reforming the frame difference between the front frame and the rear frame. H.264 is a block-based mixed coding standard, so inter-frame encoding is also implemented in the form of a pixel block. Similar to intra coding, inter-frame encoding is also performed as a minimum unit in a macroblock, mb. In the overall process of H.264, inter-frame coding can be divided into several steps: Predicting encoding (including motion estimation / motion compensation process); transform / quantization coding; entropy coding; reference frame management Among them, the transform / quantization coding and entropy coding use the same or similar solutions to the intra code. The predictive encoding uses a method based on block-based motion estimation, Motion Estimation (MC), this method is corresponding to intra prediction, also known as inter-frame prediction / interpevion ). In the H.264 framework, as shown below: Third, exercise estimation Motion estimation, sometimes referred to as motion search, that is, in the corresponding reference frame, the corresponding reference pixel block of the current pixel block is searched, so that the final code cost is minimized. In order to achieve this goal, the interframe coding defines more and more complex methods compared to the 16 × 16 and 8 × 8 macroblocks defined by intra coding. 3.1 Motion estimation macroblock division When a macroblock is encoded by interframe encoding, the macroblock will be divided according to a predefined method. For inter-frame prediction, H.264 defines four macroblock segments and 4 seed macroblock segmented methods: Macro block segmentation: 16 × 16, 16 × 8, 8 × 16, 8 × 8; child macroblock segmentation: 8 × 8, 8 × 4, 4 × 8, 4 × 3; The macroblock division of the inter prediction is shown below: When a macroblock is configured to be 8 × 8, each 8 × 8 macroblock will be further segmented in accordance with the segmentation method of the sub-macroblock. 3.2 Sports Vector In an inter-frame encoded macroblock, each divided sub-block performs a corresponding motion search, and looks out in the reference frame, the pixel block corresponding to the same size is used as a reference. The relative position of the current pixel block in the current frame is in the relative position between the total position of the reference frame represents the motion trajectory between the two frames in the pixel block. This relative position is represented by a vector (MV_X, MV_Y) consisting of two coordinate values, called motion vector, mv). A macroblock may contain 16 MV. An example of a motion vector can be represented by the following figure: In the figure, a pixel block does not have a motion relationship in the reference frame and the current frame, so the motion vector is (0, 0). Sports vector prediction For an inter-frame encoded macroblock, up to 16 sub-blocks of 16 4 × 4 pixels can be performed for motion estimation. Each sub-block is encoded and transmitted in accordance with the actual motion vector requires more bits. In order to improve the efficiency of the encoding, the method of motion vector prediction is defined in H.264. The motion vector MV of each sub-block is obtained by the calculated predicted vector MVP and motion vector residual MVD. Among them, the MVD is parsed from the corresponding syntax element in the code stream, and the MVP is calculated by the information of the adjacent pixel block. Since adjacent macroblocks or sub-blocks typically have similar motion relationships and spatial correlations, the value of the MVP is calculated from the MV value of the adjacent pixel block. The interrelationship between the current blocks in the adjacent block can be represented by the following figure: Among them, the MVP of the current block is calculated from the MV of the A, B, and C. If the pixel block C does not exist, then in the pixel block D. If the current macroblock is encoded (ie, the corresponding data is not transmitted), the MVP is calculated according to the 16 × 16 mode macroblock. Sport vector sub-pixel difference In order to further enhance the accuracy of the motion estimation, the ratio of the video compression is enhanced, and the sub-pixel concentration of motion vector is introduced in the video compression standard earlier than H.264, and further inheritance and development is obtained in H.264. In H.264, the brightness component has a maximum of 1/4 pixel accuracy, and the MV of the chromaticity component can be up to 1/8 pixel accuracy. Regardless of the 1/2, 1/4 or 1/8 pixel value in the image does not exist in the image, it is existing as a temporary value in the motion estimation process. Among them, sub-pixel of 1/2 pixel accuracy can be represented by the following figure: The sub-pixel of 1/4 pixel accuracy can be represented by the following figure: The sub-pixel of 1/8 pixel accuracy can be represented by the following figure: As can be seen from the above, the sub-pixel precision is calculated based on adjacent pixel values, theoretically present in an intermediate value between the actual pixels. Its calculation method is to obtain a weighted mean by several pixels, and the specific calculation method is defined in the 8.4.2.2.1 section of the standard document. The interpolation of the brightness information is as follows: In the figure above, uppercase letters A to U are indicated by an integer pixel point actually present in the image, and other letters represent a sub-pixel point that the difference is calculated. For semi-pixel points B and H in the horizontal and vertical direction, the calculation method is the weighting mean of adjacent six pixels, the calculation method is: B = ((e - 5 * f + 20 * g + 20 * h - 5 * i + j) + 16) >> 5; H = ((A - 5 * C + 20 * G + 20 * m - 5 * R + T) + 16). 1882.> 5; 12 For the intermediate point j of four pixels, the calculation method is similar to B and H, just the pixels used to calculate the weighting mean to change in the same direction: 6 half-pixel points in the same direction: J = ((CC - 5 * DD + 20 * H1 + 20 * M1 - 5 * EE + FF) + 512) >> 10; or J = ((AA - 5 * BB + 20 * B1 + 20 * S1 - 5 * GG + HH) + 512). 1882.> 10; 12 Calculation method of CC or other pixel point in the above formula Similar to the weighting summation method in the calculation method of half a pixel point such as B or H, (1, -5, 20, 20, 1) is values and . For the value of the 1/4 pixel position, the calculation method is simpler, that is, the value of the adjacent whole pixel or half pixel is taken. Fourth, sports search fast algorithm A complete motion search process is an extremely time consuming operation, which is: The sports search process needs to cover each pixel and sub-pixel in the search zone; sports search needs to be performed in multiple reference frames; In order to solve this problem, the researchers have proposed a variety of motion search, designed to reduce the total operation of the sports search. Among them, there are common: Three-step search method; diamond search method; hexagon search method; 4.1 three-step search method The three-step search method is compared with the full search only about 1/10 of the calculation, and the performance of the algorithm is basically consistent. The three-step search method is shown in the figure: Three-step search method running process: Starting from the center of the search window, search for 8 points + central points in 4 points, select a best match point with the minimum SAD; the best match point obtained by step 1 is central point, with 2 For the step size to continue to search similar 9 points, get the second best match point; start from the second best match point, repeat the above steps by 1 to the step size, resulting in the final motion search matching point; 4.2 Diamond Search The rhombus search method uses two templates of the girling and the small rhombus, and the Daxi includes 9 points, and the small rhombus contains 5 points, as shown below: Diamond Search Operation Steps: Starting from the center of the search window, follow the Daxi template to search for 9 points, check if the rhombus center point is the best match point in the gap shape; if the best match point is the center point of the rhombus, the search is further searched. If the best match point is not a rhombus center point, continue in accordance with the actual best match point to search until a best match point is found in the center point of the Daxi template, then search by small rhombus template; 4.3 hexagon search method The principle of the hexagon search is similar to the diamond search method. The difference is that the template used in its large mode is 7 points, and the shape of the small hexagon template is the same as the diamond template, as shown in the figure below: Hexagon search execution steps: Starting from the center of the search window, according to the large hexagon template search 9 points, check if the rhombus center point is the best match point in the large hexagon; if the best match point is the center point of the hexagon, it will follow Small hexagon template search; if the best match point is not a hexagon center point, continue in the large hexagon template search by the actual best match point until a best match point is found in the large hexagon template Central point, then search by small hexagon template;