H.264 format analysis

"I.H.264 basic flow structure The structure of the Elementary Stream, ES is divided into two layers, including video encoding layers (VCLs) and network adaptation layers (NAL). The video coding layer is responsible for efficient video content, and the network adaptation layer is responsible for packaging and transmitting data in the appropriate way of the network. The benefits of introducing NAL and bringing the VCL separation include two aspects: 1, so that signal processing and network transmission separation, VCL, and NAL can be implemented on different processing platforms; 2, VCL and NAL separation design, making it in different In the network environment, the gateway does not require reconstruction and reconstruction of the VCL bitstream because the network environment is different. ☆ VCL (Video Coding Layer): VCL is the definition of the grammatical level of the core algorithm engine, block, macroblock, and the film, and he finally outputs the data SODB ☆ NAL (NET Abstract Layer): NAL packets SODB into RBSP and add NAL heads to form a NALU (NAL unit) A typical NALU as shown below: ☆ SODB (SODB (STRING OF DATA BITS): The original data bitstream, the length is not necessarily 8 times, so it needs to be completed Be ☆ RBSP (Raw Byte Sequence Payload): Original data byte stream, SODB + RBSP TRAILING BITS = RBSP, adding trailing bits to make an RBSP as the entirety The basic stream of H.264 is composed of a series of NETWORK ABSTRACTION Layer Unit, and different NALU data is different. The draft of H.264 pointed out that when the data stream is stored in the medium, start codes are added before each NALU: 0x000001 or 0x00000001 to indicate the start and termination position of a NALU. Under such a mechanism, the start code is detected in the code stream as a NALU to start identification, and when the next start code is detected, the current NALU ends. The 3-4 bytes of each frame in the H.264 code stream is the start_code (start code) of the H.264, 0x00000001 or 0x000001.3 bytes of 0x000001 is used, is a complete When the frame is compiled as a plurality of SLICEs, starting from the second SLICE, containing the NALU of these Slice uses 3 bytes start codes. That is, if the NALU corresponding to the SLICE is used in the beginning of a frame, the 0x00000001 is used, otherwise 0x000001. About this point from the "ITU-T H.264 Recommendation" and X264 source code, the following is part of the X264 source code. // A NALU handling for (int i = start; i out.i_nal; i ++) {int {p = h-> out.nal [i] .i_payload; h-> out.nal [i]. B_long_startcode =! i || h-> out.nal [i] .i_type == NAL_SPS || H-> out.nal [i] .i_type == nal_pps || h-> param.i_avintra_class; // Add start Code x264_nal_encode (h, nal_buffer, & h-> out.nal [i]); nal_buffer + = h-> out.nal [i] .i_payload; if (h-> param.i_avintra_class) {h-> out.nal [ I] .i_padding - = h-> out.nal [i] .i_payload - (OLD_PAYLOAD_LEN + NALU_OVERHEAD); if (h-> out.nal [i] .i_padding> 0) {MEMSET (nal_buffer, 0, h-> Out.nal [i] .i_padding; nal_buffer + = h-> out.nal [i] .i_padding; h-> out.nal [i] .i_payload + = H-> out.nal [i] .i_padding; } H-> OUT.NAL [I] .i_padding = x264_max (h-> out.nal [i] .i_padding, 0);}} B_long_startcode in the code is to determine whether or not the length of the length is used before the encoding, that is, the four-byte start code 0x00000001. Then call the x264_nal_encode function to add the start code. // start code added void x264_nal_encode (x264_t * h, uint8_t * dst, x264_nal_t * nal) {uint8_t * src = nal-> p_payload; uint8_t * end = nal-> p_payload + nal-> i_payload; uint8_t * orig_dst = dst ; // start code // annexb format, start code is 0x000001 or 0x00000001 if (h-> param.b_annexb) {if (NAL-> B_LONG_STARTCODE) * DST ++ = 0x00; * DST ++ = 0x00; * DST ++ = 0x00; * DST ++ = 0x01;} else / * save room for size latter * / dst + = 4; // MP4 format ................} 2. Analysis of thenal head structure The NAL head structure is shown below: Length: 1byte, orbidden_bit (1bit) + nal_reference_bit (2bit) + NAL_UNIT_TYPE (5bit) ☆ f (Forbidden_Zero_bit): 1 bit, initial 0. When the network recognizes that this unit is a bit error, it can be set to 1 so that the recipient throws off the unit ☆ NRI (NAL_REF_IDC): 2 bits, used to indicate the importance level of the NALU. The greater the value, the more important it is currently the current NALU. When the specific greater than 0, there is no clear specification ☆ Type (nal_unit_type): 5, pointing to NALU type NAL_UNIT_TYPE NALU Type NAL_REFERENCE_BIT 0 None 0 1 Non-IDR sheet This piece belongs to the reference frame, then it is not equal to 0, and does not belong to the reference frame, then the like and 0 2 pieces of data A partition is the same as the upper 4 pieces of data C partition. Slip 5 6 Supplemental Enhancement Information Unit (SEI) 0 7 Sequence Parameter Set (SPS) Non 0 9 Different Detection 0 10 Sequence End 0 11 Code Flow End 0 12 Fill 0 13..23 reserved 0 24..31 Non-specified 0 Be NAL_UNIT_TYPE = 5: Indicates that the current NAL is a piece of the IDR image, in which case the Nal_Unit_Type of each piece in the IDR image should be equal to 5. Note that the IDR image cannot use partitions. NAL_UNIT_TYPE = 7 or 8: Each SPS or PPS only corresponds to a NALU. The corresponding RBSP data type is shown in the following table: RBSP Type Abbreviation Description Parameter Set PS includes a global information of SPS and PPS, sequence, such as an image size, video format, and other enhanced information SEI video sequence decoding enhancement information image deficit PD video image boundary encoding slice SLICE encoders, The data data divided by the DP film layer, which is used to incorrect recovery decoding sequence ends indicating that the end of a sequence, the next image is the IDR image stream end value indicating that there is no image filling data complex data in the stream for padding bytes. Sequence and image parameter set: Reduce the transfer of repetition parameters, each VCL NAL unit contains an identifier, pointing to the relevant image parameter set, each image parameter set contains an identifier, pointing to the content of the relevant sequence parameter set, only A small number of pointer information, reference a large number of parameters, greatly reducing the information of each VCL NAL unit repeatedly transmitted. Data segmentation: The encoded data of the form sheet is stored in 3 independent DP (data segmentation, A, B, c), each containing a subset of the encoder. Split a contains each macroblock data in the sheet and the slice. Split B contains encoded residual data of the intra and Si-chip macroblock. The split C contains the encoding residual data of the frame macroblock. Each split can be placed in a separate NAL unit and transmits independently. Three .frame, Field, Slice and Macro Block ☆ Frame: When the video signal is sampled, if it is scanned by a row, the resulting signal is a frame image, typically the frame rate is 25 frames per second (PAL system), 30 frames per second (NTSC system) From a macro, SPS, PPS, IDR frame (including one or more i-slice), P frame (including one or more p-slice), B frame (including one or more B-slice) together constitute typical H.264 code stream structure. ☆ Field: When the video signal is sampling, if it is via the interlaced scan (odd, even line), then one frame image is divided into two fields (each scan-odd scan or even scan, each called one ), Usually 50 Hz (PAL system), 60 Hz (NTSC system) ☆ Slice: A frame image can be encoded into one or more pieces, each containing a macro block, i.e., at least one macroblock per piece, a macroblock of the entire image. The purpose of fragmentation is to limit the diffusion and transmission of the error, keep the encoders independently. There are 5 types of sheets: I tablets (including I macroblocks), P tablets (P and I macroblocks), B tablets (B and I macroblocks), SP tablets (for switching between different coded streams) and Si-piece (special type of encoding macroblock). The syntax structure of the film is shown below: ☆ Macro Block: An encoded image first is divided into multiple blocks (4x4 pixels) to process, obviously the macroblock should be an integer, usually the macroblock size is 16x16 pixels. The macroblock is divided into i, p, b macroblock, and the I macroblock can only use the decoded pixels in the current film to perform intra prediction; the P macroblock can utilize the previously decoded image as a reference image for intra prediction; The B macroblock is an intra prediction using the front and rear reference graphics. The image is organized in units, while the image is often referred to as frame, frame, sheet, and macroblocks as shown below: When a frame image contains multiple pieces, as shown below: The relationship between the frame, the sheet and the parameter set is shown in the following figure: If the DP (data segmentation) mechanism is not used, a piece is a NALU, a NALU is also a piece. Otherwise, a piece consists of three NALUs, namely DPA, DPB, and DPC, the corresponding NAL_Unit_TYPE value of 2, 3, and 4. Be Since one frame may encode into multiple pieces, it is necessary to ensure the integrity of the frame when decoding. For example, the IDR frame may be divided into multiple IDR sheets, and a complete IDR frame can be obtained from the code stream and extracting several NALU_TYPEs of the continuously stored NALU_TYPE equal to 05 to obtain a complete IDR frame. Here, it is actually involved in frame boundary identification, and H.264 will constitute a frame of all NALU is called an AU (Access Unit), and the frame boundary identification is actually identifying AU. Because the H.264 cancels the frame-level syntax, Au cannot be simply acquired from the code stream. The decoder can only determine if the frame image ends through the combination of certain syntax elements in the decoding process. Four. NALU decoding process Five.ndraEdit Analysis H.264 File Open Test.264 files with UITRAEDIT, as shown below: Test.264 playback with MPLAYER is as follows: Due to the large amount of data, I have selected 2 paragraphs of data to analyze. 1. Analyze the first data: ☆ 00 00 00 00 00 00 00 01 is the starting mark of NALU. 00 00 01 67 behind the previously described NALU header. Take sixteenThe 67 converted to binary, 0110 0111. The value of BIT binary decimal bits binary decimal type forbidden_bit100 Nal_Reference_bit2113nalu importance NAL_UNIT_TYPE5001117 Sequence parameter set, SPS Be ☆ 00 00 00 00 00 00 00 01 is the starting mark of NALU. 00 00 01 The 68 behind 00 00 01 is the first byte of the NALU header. Convert hexadecimal 68 to binary, to get 0110 1000. Be Fields occupying the BIT binary decimal type forbidden_bit100 nal_reference_bit2113nalu importance Rating Rate NAL_UNIT_TYPE5010008 Image Parameter Set, PPS ☆ 00 00 03 00 H.264 specifies that when 0x000000 is detected, the end of the current NAL can also be characterized. So what should I do when data is 0x000001 or 0x000000 in NAL? H.264 introduces the prevention of competition mechanisms. If the encoder detects that NAL data exists 0x000001 or 0x000000, the encoder will insert a new byte 0x03 before the last byte, this: 0x000000-> 0x00000300 0x000001-> 0x00000301 0x000002 -> 0x00000302 0x000003-> 0x00000303 When the decoder detects 0x000003, the 03 is discarded and the original data (shell operation) is restored. When the decoder is decoded, first read the NAL data, statistically NAL's length, and then start decoding. ☆ 00 00 00 01 65 00 00 00 01 is the starting mark of NALU. 00 00 01 The 65 after 00 00 01 is the first byte of the NALU header. Convert hexadecimal 65 into binary, 0110 0101. Fields occupying the Bit bits of binary decimal type forbidden_bit100 nal_reference_bit21113nalu importance Rating Rate Nal_Unit_type5001015IDR Image 2. Analyze the second data: ☆ 00 00 00 01 41 00 00 00 01 is the starting mark of NALU. The 41 behind 00 00 01 is previously described by 1 byte NALU head. Convert hexadecimal 41 to binary, get 0100 0001. The field occupying the Bit binary decimal type forbidden_bit100 nal_reference_bit2102nalu importance Rating Rate Nal_Unit_type5000011 district, non-idR image Be In the Baseline category, Nal_Unit_type = 1 is the P frame because Baseline does not have a B frame. Be For the category and level of H.264, see: H.264 Video Compression Standard Reference book: "New Generation Video Compression Coding Standard H.264-AVC" Reference link: http://depthlove.github.io/2015/09/23/Use-tool-to-analyze-h264-file/ Reference link: http://www.cnblogs.com/taigacon/p/5215448.html Reference link: http://blog.csdn.net/chinadragon76/Article/details/22408727 Original is not easy, please indicate the source: https://blog.csdn.net/caoshangpa/Article/details/53019793 "