[H.264 / AVC video codec technology detailed] 1. Video information and compression coding

"H.264 / AVC Video Code Technology Detailed" video tutorial has been on "CSDN", which details the background, standard protocol and implementation of H.264, and through a practical engineering form to H.264 Standard analysis and implementation, welcome to watch! "The paper is very shallow, perceived this matter", only the standard document is operated in the form of code, in order to have a sufficient understanding and understanding of the video compression coding standards! Link address: H.264 / AVC video codec technology detailed This section video free First, the interaction between people and the world From the date of birth from ancient times, human beings are constantly struggling in humans, in order to adapt to the environment and transform the environment. The most basic premise is to use the sensory to obtain external information. With various sensory, humans can communicate a variety of information interactions with the world environment, for example: Smell: Identify various smells, identify environmental changes and quality of food, drinking water; auditory: identify dangerous signals such as contact information and natural enemies, etc.: Picking the most suitable food tactile: It is very important when making and use In addition, the most important natural is visual. According to statistics, in various senses of people, visual accounting for more than 70%. And visual can make people the most direct reflection of environmental changes. In the development of civilization, people are not satisfied with the images seen only by the oral record, but hope to record them in more intuitive forms. After years of development, after years of development, the video has become the most efficient way of recording and reproducing information, and a large amount of information can be transmitted in a relatively short period of time. The video passes the image expression information of each of the frames; the audio contained in the video can provide a large amount of information; the video is provided by the image of the image, the change of the scene; In summary, the video information provides information indicating the most close to people's direct experience. Second, the representation of the video signal: RGB and YUV Imaging and early video processing in the real world are analog signals. However, in order to adapt to modern computers, network transmission and digital video processing systems, the simulated video signal must be converted into digital format. In the digital format of the video signal, the basic structure of the constituent video is an image associated with a frame frame. The basic structure of each frame image is a pixel that is tightly arranged, and each pixel represents a color point in the image. In order to make the pixel in color, each pixel is constructed of different components from the three-base color of the color: R: red - red component; g: green - green component; b: blue - blue component; Use this method to represent the color image of the color image is RGB color space. RGB color space is often used in the display system. Through this form, each color component of each pixel is represented by one byte, then 256 × 256 × 256 different colors can be represented. In the common image format, the data is saved in the RGB format in the bitmap (BMP) format. During video processing such as actual codes, YUV format is more common than RGB format. In the YUV format, one pixel is represented by the brightness component and the chroma component, each pixel consists of a luminance component Y and two chroma component U / V. The brightness component can correspond to the chromaticity component, or the chrominance component can be sampled, that is, the total amount of the chromaticity component is less than the brightness component. In this way, the reason why people's sensory information is much higher than that of chrominance information because people's sensory sensitivity is much higher than that of chrominance information. Therefore, the greatest advantage of YUV is appropriate to reduce the sampling rate of chrominance components, and ensure that there is no significant impact on the image. Moreover, using this approach can also be compatible with black and color display devices. For black and white display devices, only the color component is required to display only the brightness component. The color sampling mode in YUV is 4: 4: 4, 4: 2: 2 and 4: 2: 0, etc., as shown below: Third, video compression coding The concept of encoding is widely used in the field of communication and information processing, and the basic principle is to represent and transmit information in a certain form of code stream in a certain rule. Commonly used information is required to include: text, voice, video and control information, etc. 1. Why do you need to encode video For video data, the most important purpose of video encoding is data compression. This is because the pixel form of moving images indicates that the amount of data is extremely large, and the storage space and transmission bandwidth are completely unable to meet the needs of saving and transmission. For example, three color components of each pixel of the image each require one byte representation, then each pixel requires at least 3 bytes, and the size of the image of the resolution 1280 × 720 is 2.76m bytes. If the video of the same resolution is 25 frames per second, the code rate required to transfer will reach 553MB / s! If a higher-definition video, such as 1080p, 4K, 8K video, its transmission rate is more amazing. Such a data amount, whether it is storage or transmission, is unbearable. Therefore, compression of video data is called inevitable. 2. Why video information can be compressed The reason why video information has a large amount of space that can be compressed because there is a large amount of data redundancy. Its main types are: Time redundancy: the content between the two frames of the video is similar, and there is a motion relationship space redundancy: the adjacent pixels within a frame of the video are similar to similarity encoding redundancy: the probability of different data in the video different visual redundancy : The visual system of the audience is different from different parts of the video. For these different types of redundant information, there are different technologies specially responding in various video coding standard algorithms to increase compression ratios by different angles. 3. Video coding standardization organization Standardized organizations engaged in video coding algorithms mainly have two, ITU-T and ISO. ITU-T, full name International Telecommunications Union - Telecommunication Standardization Sector, That is, International Telecommunications Alliance - Telecommunications Standard. The VECG (Video Coding Experts Group) under the organization is mainly responsible for the formulation of standards for real-time communications, mainly developed H.261 / H263 / H263 + / H263 ++ and other standards. ISO, full name International Standards Organization, an international standardization organization. The Motion Picture Experts Group under the organization, that is, the mobile image expert group is mainly responsible for video storage, radio and television, and network transmission, mainly developed MPEG-1 / MPEG-4 and so on. In fact, the standards that really have strong influence in the industry are produced by two organizations. For example, MPEG-2, H.264 / AVC and H.265 / HEVC, etc. The development of video coding standards developed by different standards organizations is shown below: Fourth, the basic technology of video compression coding In order to handle a variety of redundants in video information, video compression coding uses a variety of techniques to improve video compression ratios. Common predictive codes, transform coding, and entropy encodings, etc. Predictive code Predicting coding can be used to process redundancy in the time and spatial domain in the video. The predictive coding in video processing is mainly divided into two categories: intra prediction and inter-frame prediction. Inframe prediction: The predicted value is located within the same frame for eliminating the spatial redundancy of the image; the intra prediction is characterized by relatively low compression ratio, but can decode independently, not relying on other frames; normal video The key frames are in intra prediction. Inter-frame prediction: The actual value of the inter prediction is located at the current frame. The predicted value is located at the reference frame, and the time redundancy for eliminating the image; the compression ratio of the inter-frame prediction is higher than the intra prediction, but can not decode independent, must be referenced The current frame can be reconstructed after frame data. Typically in the video stream, I frame all uses intra coding, and data in the P frame / B frame may use intra or inter-frame encoding. 2. Transform Coding At present, the mainstream video coding algorithm is inextricted, and the loss can be tolerated by the limited video, and the relatively higher coding efficiency is obtained. The part of the information loss is to transform this part. Before you quantify, you first need to convert the image information from the spatial domain to the frequency domain, and calculate its transform coefficient for subsequent encoding. In the video coding algorithm, an orthogonal transform is typically used, and the commonly used orthogonal transform method is: discrete cosine transform (DCT), discrete sinusoidal transformation (DST), K-L conversion, and the like. 3. Entropy encoding The entropy encoding method in video coding is mainly used to eliminate statistical redundancy in video information. Due to the probability of each symbol in the sank of sources, it is inconsistent, which causes all symbols to have waste using the same length. By entropy encoding, different lengths of symbols are assigned to different syntax elements, which can effectively eliminate redundancy due to symbol probability in video information. In the entropy encoding method commonly used in the video coding algorithm, it has a beam coding and arithmetic coding, etc., specifically, there are mainly context adaptive variable length coding (CAVLC) and context adaptive binary arithmetic coding (CABAC).