"First, H264 Overview
H.264, usually also referred to as H.264 / AVC (or H.264 / MPEG-4 AVC or MPEG-4 / H.264 AVC)
1. The meaning of H.264 video codec
The appearance of H.264 is to create a more efficient compression standard than previous video compression standards, using a better video compression algorithm to compress video footprint, improve storage and transmission efficiency, while obtaining effective compression effects , Minimize distortion caused by the compression process. MPEG-4 AVC and H.264 are currently a more mainstream coding standard. It mainly defines two aspects of content: video data compression forms of encoding representation and syntax with reconstruction video information to describe the encoding method. The purpose is to ensure that the compatible encoder can successfully interact, and it also allows manufacturers to develop competitive innovative products. What makes manufacturers only need to pay attention to the same results that can be obtained and the methods used in standards.
2. The theoretical basis of H.264 codec
When it comes to H.264 codec, let's briefly say the video compression algorithm. The video compression algorithm is achieved by removing time and space redundancy. In a period of time, the pixels of adjacent images, brightness and color temperature are small, we are not more than the code to be done to each image, but the first picture of this time (that is, the first Frame) As a complete encoding, while the image after a period of time only needs to record the difference between pixels, brightness, color temperature, and the like. By removing different types of redundancy, it can be obvious compressed data, and the cost is part of the information distortion.
H.264 Codes In the entire video data processing, the codec layer belonging to the video data processing, the specific portion of the codec section can be viewed in the codec flow chart: Thinking-In-AV / audio and video codec / sound Video decoding process overview .png. The encoding section can be understood in turn.
Second, H.264 Related Concepts
1. Basic units of H.264
In the structure defined in H.264, a video image encoded data is called one frame. One frame is composed of one or more sheets, one piece consists of one or more macroblocks (mb) (macroblock is the basic unit encoded by H264), and a macroblock is YUV data from 16x16. consist of.
2. Frame type
In the H.264 protocol, three types of frames are defined, named I frame, B frame, and P frames, respectively. Where I frame is the previous image frame we said, and the B frame and the P frame correspond to a frame that is not previously coded. The difference between the B frame and the P frame is that the P frame is generated by the I frame before reference, and the B frame is generated by the image frame before and after reference.
During the video screen playback, if the I frame is lost, then the last P frame will not solve it, the video screen will appear in a black screen; if the P frame is lost, the video screen will appear, mosaic, etc. Phenomenon.
3. GOP (Screen Group)
A GOP (Group of Picture) is a set of continuous screens. The GOP structure typically has two numbers, one of which is the length of the GOP (ie the B frame and the number of P frames between the two I frames), and the other number is the interval distance between the I frame and the P frame (ie the number of B frame numbers. ). In a GOP internal I frame decoding does not rely on any other frame, the P frame decoding relies on the previous I frame or P frame, and the B frame decodes the previous I frame or P frame and the nearest P frame thereof.
Note: The larger the GOP value, the larger the GOP value, the more the number of p, and the frame, the more the number of bytes occupied by each I, p, and B, which is more easily obtained. Good image quality; the larger the REFERENCE, the more the number of B frames, the same is more easier to obtain better image quality. However, by increasing the GOP value to increase image quality is limited. When the H264 encoder is encountered in the scene of the scene, it will automatically insert an I frame. At this time, the actual GOP value is shortened. On the other hand, in one GOP, the P, and the B frame is predicted by I frame, and when the image quality of the I frame is poor, it affects image quality of subsequent P, B frame in a GOP until the next GOP It is possible to start to recover, so the GOP value should not be set too much. At the same time, since the complexity of the P, B frame is greater than I frame, too much P, B frame affects the coding efficiency, making the coding efficiency decrease. In addition, too long GOP also affects the response speed of the SEEK operation. Since the P, B frame is predicted by the previous I or P frame, the SEEK operation needs to be directly positioned, decoded a certain P or B frame, it is necessary to first Decoding the I frame and the previous N predictive frames within this GOP can be, the longer the GOP value, the more predicted frames that need to decode, the longer the Seek response time.
4. Idr frame
The I frame in GOP is divided into ordinary I frame and IDR frame. The IDR frame is the first I frame of GOP, so that the distinction is considered to facilitate the control encoding and decoding flow. The IDR frame must be an I frame, but the I frame is not necessarily an IDR frame.
The IDR frame is that the decoder needs to do because of the IdR frames, and the PPS and SPS parameters are updated when receiving the IDR frame.
It can be seen that the role of the IDR frame is to let the decoder immediately refreshed the relevant data information, avoiding a larger decoding error problem.
The introduction of the IDR frame mechanism is to decode the rising synchronization. When the decoder decodes the IDR frame, the reference frame queue will be cleared immediately, and all the decoded data is output or discarded, and the parameter set is re-found, starting a new sequence. In this way, if the previous sequence is wrong, you can get a chance to resynchronize. The frame after the IDR frame will never decode the data before the IDR frame.
Third, H.264 compression method
1. H.264 compression algorithm
The core compression algorithm of H264 is an algorithm of intra compression and inter-frame compression. The intra compression is an algorithm for generating I frames, and inter-frame compression is an algorithm for generating B frame and P frames. The principle of intraframe compression is: When compressed one frame image, only the data of this frame is considered without considering the redundant information between adjacent frames, generally uses a lossless compression algorithm, since intra compression is encoded A complete image, so you can decode independently, display. The intra compression ratio is generally not high. The principle of interframe compression is that the data of the adjacent frames has a large correlation, or the two frames of information in front and rear files are small. Continuous video Its adjacent frames have redundant information, depending on this feature, the redundancy amount between compressed adjacent frames can further increase the compression amount and reduce the compression ratio.
The frame compression is also referred to as a temporalcompression, which is compressed by comparing data between different frames on the time axis. The inter-frame compression is non-destructive, which compares the difference between the frame and the adjacent frame, which records only the difference of the adjacent frame thereof, which can greatly reduce the amount of data.
2. H.264 compression mode description
H.264 The specific way to compress video data is as follows:
a). Packet, that is, the image of a series of transformations is not returned into a group, namely a GOP;
b). Define frames, score each group of images frames into three types of I frame, P frame, and B frame;
c). Predict frame, as a base frame, in the I frame, predict the P frame, and then predict the B frame by I frame and P frame;
d). Data transmission, finally stores and transmits I frame data with predicted difference information.
Fourth, H.264 hierarchical structure
The main goal of H.264 is to have high video compression ratio and good network affinity, and H264 divides the system frame into two levels, which are video coding levels (VCLs) and network abstraction (NAL).
VLC layer (Video Coding Layer)
VLC layer: Includes core compression engines and blocks, macroblocks, grammar levels, design goals are efficient encoding as much as possible from the network;
2. NAL layer (Network Abstract Layer)
NAL layer: Responsible for adapting the bit string generated by the VCL into a variety of networks and multivariate environments, overlying the grade level above.
3. NALU (NAL Unit)
H.264 Original code stream (bare stream) is composed of a NALU, the structure is shown below, a NALU = a set of NALU header information corresponding to video encoding + an original byte sequence load (RBSP, RAW BYTE SEQUENCE PAYLOAD ).
An original H.264 NALU unit is often composed of [Startcode] [NALU HEADER] [NALU PAYLOAD].
3.1 Start Code
START CODE is used to indicate that this is the beginning of a NALU unit, must be "00 00 00 00 01" or "00 00 01".
3.2. Nal Header
Nal Header consists of three parts, forbidden_bit (1bit), nal_reference_bit (2bits) (priority), NAL_Unit_type (5bits).
3.3 RBSP (Raw Byte Sequence PayLoad))
The following figure is a sample of the sequence of RBSP and a description of the relevant type parameters:
SPS is a sequence parameter set, which is included in a parameter for a continuous encoding video sequence, such as identifier SEQ_PARAMETER_SET_ID, frame number, and POC constraints, reference frame numbers, decoded image size, and frame field encoding mode selection. Be
PPS is an image parameter set, corresponding to a sequence in a sequence, a few images, whose parameters such as identifier PIC_Parameter_Set_ID, optional seq_parameter_set_id, entropy coding mode selection identifier, number of chip groups, initial quantization parameters, and go block Filter coefficient adjustment identification, etc.
The parameter set is an independent data unit that does not rely on other syntax elements outside the parameter set. A parameter set does not correspond to a particular image or sequence, and the same sequence parameter set can be referenced by one or more image parameter sets. Similarly, an image parameter set can also be referenced by one or more images. A new parameter set is issued only when the encoder thinks about the contents of the parameter set. Be
V. H.264 Limitations
With the rapid development of the digital video application industry chain, the trend of video applications developed to the following directions is more obvious:
(1) HigherDefinition: Digital video application format is fully upgraded from 720p to 1080P, and now 4K digital video format has become common.
(2) HigherFrame Rate: Digital video frame rate upgrade from 30FPS to 60FPs, 120FPs or even 240fps;
(3) Highercompression Rate: Transport bandwidth and storage space have always been the most critical resource in video applications, so the best video experience in limited space and pipeline has always been the unremitting pursuit of users.
However, in the face of video applications, the current trend of high-definition, high frame rate, and high compression ratio, the current mainstream video compression standard protocol H.264 is constantly highlighted. mainly reflects in:
(1) The explosive growth of the number of macroblocks will result in excessive codewords occupied by macroblock-level parameter information such as macroblock prediction mode, motion vector, reference frame index, and quantification level, for encoding The code word in the difference is significantly reduced.
(2) Since the resolution is greatly increased, the information of the image content represented by a single macroblock is greatly reduced, which will result in the similarity of low frequency coefficients after the adjacent 4 x 4 or 8 x 8 transformations, resulting in an occurrence A large amount of redundancy.
(3) Since the resolution is greatly increased, the magnitude of the motion vector of the same movement will greatly increase, and a motion vector predicted value is used in H.264, and the Sport vector difference coding is the Columbus index code, the encoding method. The characteristic is that the smaller the number of bits used, the less. Therefore, as the motion vector amplitude is greatly increased, the compression ratio of the method for predicting the motion vector and coding in H.264 will gradually decrease.
(4) Some key algorithms of H.264 are, for example, two context-based entropy-based entropy methods based on CAVLC and CABAC, and DEBLOCK filtering, etc., are required to be serialized, and the performance is relatively low. This serialization of H.264 is increasingly becoming a bottleneck restricting computing performance for CPUs such as GPU / DSP / FPGA / ASIC.
So the HEVC (H.265) protocol standard for higher definition, higher frame rate, higher compression rate video applications should be born. Based on the complexity of H.264 standard 2 to 4 times, the H.264 standard is more than doubled. "
Our other product: