"1. What is H264?
H264 is one of the latest encoding formats defined by the MPEG-4 standard, and is also one of the highest technical content, representing the latest technical level, and the standard write should be H.264.
The H264 video format is damaged, but it is preferable to reduce the storage volume and low-bandwidth image rapid transmission under the technique of reducing storage.
2, some related noun explanations
The picture below shows the h264 code stream.
2.1, VCL & NAL
The H264 original stream is composed of a NALU (NAL UNIT), which is divided into two layers, VCL (Video Coding Layer) video encoding layer and NAL (Network Abstract Layer) network extract layer.
VCL: Includes core compression engines and blocks, macroblocks, grammar levels, design goals are efficient encodings that are as independent of the network as much as possible; NAL: Responsible for adapting the bit string generated by VCL to a variety of In the network and multi-environments, the grammatical levels above all the tops are covered;
NAL is a layer of data packaging operations developed to adapt to network transmission applications. The video code stream completed by the traditional video coding algorithm is under any application domain (regardless of storage, transmission, etc.) is a unified code stream mode, and the video code flow only has a video coding layer VCL (Video Coding Layer). H.264 can add different NAL pieces according to different applications to accommodate different network applications, reduce the transmission error of the code stream.
These encoded VCL data are mapped or encapsulated into NAL units (NALU) before VCL performs data transmission or storage.
A nalu = set NALU header information corresponding to video encoding + an original byte sequence load (RBSP, RAW BYTE SEQUENCE PAYLOAD)
1 
 
An original H.264 NALU unit is often composed of [StartCode] [NALU Header] [NALU PAYLOAD], where start code is used to indicate that this is the beginning of a NALU unit, must be "00 00 00 01"
 
 The actual original video image data is stored in the VCL hierarchical NAL Units
 
2.2, slice (SLICE)
A piece = slice header + slice data
1 
The film is a new concept proposed by H.264. The actual original video image data is saved in the NAL Unit of the VCL level, and this part of the data is called a slice in the code stream (SLICE). A slice contains some or all of the data of an image, in other words, one frame video image can be encoded as one or several slice. One slice contains minimal contains data that contains the entire frame image. In different encoding implementations, the number of SLICEs configured in the same frame image is not necessarily the same.
 
 After a slice encoding is packaged into a NALU, so SLICE = NALU
 
So why to set the film?
The purpose of the set tablet is to limit the diffusion and transmission of the error, and the encoders should be independent of each other. A prediction of a piece cannot be referred to in the macroblock in other slices, so that the prediction error in a piece will not propagate into other slices.
In the figure above, you can see that a plurality of macroblocks are arranged in each image. A video image can be equipped with one or more pieces, each containing an integer macroblock (MB), with at least one macroblock.
Slice type
SLICE Sign I SLICE only contains I macroblock P SLICE contains P and I macroblock B SLICE contains B and I macroblock sp slice contains a P or I macroblock, a special type of Si SLICE between different code streams. Encoded macroblock
SLICE composition
Each SLICE is generally consisting of two parts, part of the SLICE HEADER, is used to save the overall information of the SLICE (such as the current SLICE type, etc.), and the other part is SLICE BODY, which is usually a set of continuous macroblock structures (or Macro block skipping information)
 
2.3, macroblock
Just mentioned the macro block in the film, then what is a macroblock?
The macroblock is the main carrier of video information. An encoded image is usually divided into multiple macroblocks. The brightness and chrominance information of each pixel is included. The main work of video decoding is to provide an efficient way to get the pixel array in the macroblock in the code stream.
A macroblock consists of a 16 × 16 brightness pixel and an additional 8 × 8 CB and an 8 × 8 CR color pixel block.
The macroblock classification I macroblock utilizes a pixel in which the decoded pixels in the current film uses the previously encoded image as a reference to perform intra prediction B macroblock utilization two-way reference images (current and future Code image frame) Perform intra prediction
2.4, frame (Frame) and field (Filed)
One and one frame of the video are used to generate an encoded image, and one frame is usually a complete image. When the video signal is acquired, if an interlaced scan (odd, even line) is used, the one frame of scanning is divided into Two parts, each part is called [field], according to the order, divided into [Top Field] and [underfold].
Extended reading: Why generate the concept of the field?
The flashlight video image refresh of the human eye is 0.02 seconds, i.e., when the frame rate of the television system is less than 50 frames / second, the human eye can feel the flashing of the picture. Conventional, such as Pal-type TV system frame rate is 25 frames per second, the NTSC system is 30 frames / sec, and if the progressive scan will inevitably generate a flashing phenomenon when the video is refreshed. On the other hand, if a simple improvement frame rate reaches a flashing effect, the frequency band width of the system will increase.
This takes out the concept of interlaced technology and [field]
In the interlaced scan, each frame contains two fields, where each field contains half of the horizontal lines, Top Field contains all odd lines, and the bottom field contains all even numbers. In the TV display, the electron gun is transmitted every line every line - first launching odd digital 13579 ... (TOP Field) turning back and then launches 2468 ... (Bottom Field) Use two scans to complete an image, because the delay, we The effect of seeing is almost. If the frequency of FRAME in NTSC video is 30 times / second -àfield, it is 60 times / sec, which is greater than the frequency of the flicker.
appropriate types 
Mode action domain frame encoding mode is small or stationary image should adopt a field encoding mode of moving image
 
2.5, I frame, P frame, B frame and PTS / DTS
Frame classification Chinese meaning I frame intra-frame encoding frame, but also, Intra Picturei frame is typically the first frame of each GOP (a video compression technology used by MPEG), after moderately compression, as a random access Click, you can be used as an image. The I frame can be seen as a compressed product P frame forward predictive encoding frame, and the predictive-frame is compressed by compressing the amount of time redundant information lower than the previously encoded frame in the image sequence. Image, also called predictive frame B frame two-way predictive frame, also known as the Bi-Directional Interpolated Prediction Frame not only considers both the encoded frame in front of the source image sequence, and also takes time to compress the transmission between the encoded frames after the source image sequence. The encoded image of the data is also called two-way predictive frame.
I frame: It can be extracted into a single complete picture through the video decompression algorithm; P Frame: Refer to a complete picture to generate a single I frame or b frame; B Frame: Refer to the previous I Or P frame and a P frame behind it generate a complete picture;
PTS / DTS
Name Meaning PTS (Presentation Time Stamp) PTS Mainly used to measure decoding video frames when it is displayed in DTS (Decode Time Stamp) DTS mainly identifies the BIT stream in memory to decode in the decoder
 
DTS and PTS different: DTS's primary user video decoding, in decoding phase. The PTS is mainly used for synchronization and output of the video, uses when Display. The order of output is the same when B Frame is.
2.6, GOP
GOP is a picture group, a GOP is a set of continuous pictures. GOP typically has two numbers, such as m = 3, n = 12, m to develop the distance between the I frame and the P frame, n specifies the distance between the two I frames. Then the current GOP structure is
I BBP BBP BBP BB i
1 
Increasing the image group can effectively reduce the encoded video volume, but will also reduce the quality of the video, as for how to pay, to see the demand.
2.7, idR
The first frame of a sequence is called an IDR frame (INSTANTANEOUS DECODING Refresh, which is decoded immediately).
I frame and IDR frames are used in intra prediction, essentially the same thing, in decoding and encoding for convenience, separating the first I frame in the video sequence, and other I frames, so the first I frame It is called IDR, which is convenient to control the encoding and decoding flow.
The role of the IDR frame is to refresh immediately, making the error induced, starting from the IDR frame, re-calculate a new sequence to start encoding.
Core role
H.264 Introducing IDR frames To decode the rising synchronization, when the decoder decodes the reference frame queue, the reference frame queue is cleared, and the decoded data is output or discarded, and the parameter set is re-found, starting a new sequence. Thus, if a major error occurs in the previous sequence, the chance of resynchronization can be obtained here, and the frame after the IDR frame will never decode the data of the image before the IDR.
3, H264 code stream hierarchical structure
 
As shown in the figure above, in H264, syntax elements are organized: sequence, image (frame), five levels of macroblock, and sub-macroblock.
The hierarchical structure of the syntax element helps save a stream more efficiently. For example, in an image, there is often the same data between the individual sheets, and if each piece carries these data, it is bound to cause waste of the stream. More efficient practice is to extract the public information of the image to form an image-level syntax element, and only the sheet-level syntax elements are carried only.
4, NALU Header & RBSP structure
 
As shown above: NALU = NAL HEADER + RBSP
4.1, Nalu HEADER
As mentioned earlier, each NALU consists of a byte of Header and RBSP (Raw Byte Sequence PayLoad).
Nalu Header consists of three parts, forbidden_bit (1bit), NAL_REF_IDC (2BITS) represents priority, NAL_Unit_type (5bits) represents the type of the NALU.
Forbidden_Zero_bit
1 Bit, H264 stipulates that this bit must be 0
NAL_REF_IDC
Used to indicate the importance of the current NALU, the greater the value, the more important
When the decoder can't be decoded, you can lose the importance of 0 NALU.
When NAL_REF_IDC does not equal 0, the content of NAL Unit may be a slice of SPS / PPS / reference frame equal to 0, and the content of NAL Unit may be a non-reference image of a piece of a piece of image of a piece of image of an image, the image All sheets should be equal to 0
NAL_Unit_type
NAL_UNIT_TYPE Whether to include VCL layer encoding data is divided into VCL NAL Units and Non-VCl Nal Units;
The VCL NAL Units contains data of the VCL layer encoded output, while the Non-VCL NAL Units is not included.
NAL_UNIT_TYPETHE Content of Nal Unit1 ~ 5VCl Nal Unitsothers (SPS / PPS .etc) NON-VCL NAL Units
All values are as follows:
 
4.2, RBSP
 
The above figure is an RBSP sequence example
 
The above figure is a description of RBSP
SODB with RBSP
SODB data bit string -> is an encoded raw data. RBSP raw byte sequence load -> The end bit is added behind the original encoded data. A bit "1" several bits "0" so that the byte is aligned. "
			
			
			
			
			
			Our other product: