Essential knowledge of RTMP transmission H.264 video (1)

"RTMP transmission H.264" start frame transmission of video stream media RTMP Transfer H.264 Video streaming is currently a common function, recently studied some research and summary. To use the RTMP protocol to smoothly stream flow to the RTMP server with the RTMP protocol, you need to make a head package and corresponding packaging of some rules that have encoded the H.264 video streaming media in the RTMP protocol. Can. We know, if you want the RTMP client to connect the server, how do our clients know the video size and video frame rate of the data source thrown? The reason is to provide the short flow, how is it provided? The answer is that the light flow will transfer the necessary SPS and PPS before the initial connection to the RTMP server. What is the SPS and PPS? In fact, the type of definition frame in H.264 has the following definition: // H264 type defined values for nal_unit_type typedef enum {NALU_TYPE_SLICE = 1, // non-key frames NALU_TYPE_DPA = 2, NALU_TYPE_DPB = 3, NALU_TYPE_DPC = 4, NALU_TYPE_IDR = 5, // keyframe NALU_TYPE_SEI = 6, NALU_TYPE_SPS = 7, / / SPS NALU_TYPE_PPS = 8, // PPS NALU_TYPE_AUD = 9, NALU_TYPE_EOSEQ = 10, NALU_TYPE_EOSTREAM = 11, NALU_TYPE_FILL = 12, #if (MVC_EXTENSION_ENABLE) NALU_TYPE_PREFIX = 14, NALU_TYPE_SUB_SPS = 15, NALU_TYPE_SLC_EXT = 20, NALU_TYPE_VDRD = 24 // View and Dependency REPRESENTATION DELIMITER NAL UNIT #ENDIF} NALUTYPE; SPS is the sequence parama set, but also known as the sequence parameter set. A global parameter of a set of coded video sequences is saved in the SPS. The so-called sequence of pixel data encoding a frame of the original video is encoded. The parameters dependent on each frame after the encoding of each frame are stored in the image parameter set. General SPS and PPS NAL Unit are typically located at the starting position of the entire stream. However, in some special cases, these two structures may also occur in the middle of the code stream, and the main reasons may be: The decoder needs to begin decoding in the middle of the code stream; the encoder changes the parameters of the code stream (such as image resolution, etc.) during the encoded process; When doing video playback, in order to make the subsequent decoding process can use the parameters contained in the SPS, the data must be parsed. We analyze the SPS and PPS parts with a section of H.264 data, but we don't analyze H.264 in detail. If you need to view H.264 data analysis, please see another article The latter data is viewed in hexadecimal, and the definition of the H.264 per frame is 00 00 00 00 01 or 00 00 01. For example, the H264 file fragment below contains three frames of data: 00 00 01 67 42 C0 33 A6 81 E0 51 A1 00 00 03 00 01 00 00 00 00 00 01 68CE 1F 20 00 00 01 06 05 FF FF 4C DC 45 E9 BD E6D9 48 B7 96 2C D8 20 D9 23 EE EF 78 32 36 34 202D 20 63 6F 72 65 20 31 34 38 20 2D 20 48 2E 3236 34 2F 4D 50 45 47 2D 34 20 41 56 43 20 63 6F The first frame is 00 00 00 00 00 01 67 42 c033 A6 81 E0 51 A1 00 00 03 00 01 00 00 00 32 8F 18 32 A0 The second frame is 00 00 00 01 68 CE 1F 20 The third frame is 00 00 00 01 06 05 FF FF 4C DC 45 E9 BD E6 D9 48 B7 96 2C D8 20 D9 23 EE EF 78 32 36 34 20 2D 20 63 6F 72 65 20 31 34 38 20 2D 20 48 2E 32 36 34 2F 4D 50 45 47 2D 34 20 41 56 43 20 63 6F Let's analyze it in detail: We go to the first frame to remove the defense, that is, the remaining 67 42 c0 33 A6 81 E0 51 A1 00 00 03 00 01 00 03 00 32 8F 18 32 A0 The part of this useful for SPS is the first byte 67 The way the frame type is judged as the low four digits of the first byte after the interface. The frame type of the first frame is: 0x67 & 0x0f = 7, this is a SPS frame The frame type of the second frame is: 0x68 & 0x0f = 8, this is a PPS frame The frame type of the third frame is: 0x06 & 0x0f = 6, this is a SEI frame So the first frame of this article is SPS frame, and we must put some of the data in the SPS frame, in accordance with some of the FLVs just said, splicing to the corresponding VIDEO TAG format, if the audio is stitching to the corresponding called Audio TAG format FLV, F4V format standard document Video_file_format_spec_v10.pdf Therefore, according to the documentation, you need to set the data buffer [0] position in the SPS of the RTMP server. 0x17 is also a high 4-bit 1 means a key frame, which is 7-bit 7 means AVC format (that is, H.264, AVC is actually the alias of the H.264 protocol. But since the H.264 protocol After adding the SVC's part, people habits will refer to the H.264 protocol that does not contain SVCs is called AVC, and the part of the SVC is separately called SVC.) According to the following document, we have to set the avcpackettype to the AVC Sequence Header, which is 0, which takes up buffer [1] = 0x00, which is subsequent as the compositionTIME in the figure, if it is avcpackettype = 0, these three bytes are also 0, buffer [2] = 0x00, buffer [3] = 0x00, buffer [4] = 0x00 Then, AVCDecoderConfigurationRecord (which is actually AVC Sequence Header), AvcDecoderConfigurationRecord. Contains H.264 decoding related comparison SPS and PPS information, and then send SPS and PPS information before sending data streams. Otherwise, the decoder cannot decode normally. Moreover, before the decoder STOP, if START, such as seek, fast forward retraction status switches, etc., you need to re-send a message from SPS and PPS. AvcDecoderConfigurationRecord is also averaged in the FLV file, which is the first video TAG. H.264 Standard Document H.264-AVC-ISO_IEC_14496-15.PDF Be As can be seen from the figure, the next buffer [5] = 0x01 is fixed to 1 for the configuration version number, and some of the data definitions are in another document. H.264 Standard Document H.264-AVC-ISO_IEC_14496-10 The AVCPROFILEINDICATION is defined as 13 configurations in the figure, in Appendix A. Conformance of a bitstream to the baseline profile is indeadated by profile_idc Being Equal to 66. We have just left the first frame of hexadecimal in H.264, which is the remaining 67 42 c0 33 A6 81 E0 51 A1 00 00 03 00 01 00 03 00 32 8F 18 32 A0, then we have The 67 is the sign of the SPS, then the remaining 42 is AVCPROFILILINDICATION (configuration feature), that is, 66, so this is Baseline (baseline configuration). So we here buffer [6] = 0x42 (decimal 66). For some analysis of baseline configuration, we put them in subsequent articles for analysis. Wiring C0 33 A6 81 E0 51 A1 00 00 03 00 01 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00328f 18 What is C0? In H.264-AVC-ISO_IEC_14496-10, the support level of each profile is set to 1, if all support will set the corresponding bit bit to 1, otherwise it is 0, let's see C0 = binary (1100 0000), corresponding The eight Bit below, that is, the first two constraint_set0_flag = 1, constraint_set1_flag = 1, and the previous constraint_set0_flag represents the support level of the Baseline Profile, the latter constraint_set1_flag represents the level of support for Main Profile, that is, they are 1 It is all supported, so it is necessary to put buffer [7] = 0xc0. Next Level-IDC, I didn't find the relevant document, quote the other person to find the information, Profile_idc: LevelIDC: Be Be From the figure, we can also know that our remaining 33 A6 81 E0 51 A1 00 00 03 00 01 00 00 03 00 32 8F 18, Buffer [8] = 0x33 (decimal 51), next is a key point, a lot People are confused, do some explanations first. Quote someone else's article is taken: H.264 stream format The H.264 standard specifies how the video encodes a separate package, but how to store and transmit these packets but not specifically, although the standard contains an Annex attachment, a possible format Annex B, but this is Not a format that must be required. In order to target different storage transmission requirements, two packaging methods have occurred. One is an ANNEX B format, and the other is called AVCC format. Annex B As can be seen from above, the data in a NALU does not include his size (length) information, so we don't simply connect a NALU to generate a stream, because the receiving end of the data stream does not know where a NALU is End, where the other NALU starts. The Annex B format uses start code to solve this problem, which adds three-byte or four-byte start code 0x000001 or 0x00000001 at the beginning of each NALU. By positioning the start code, the decoder can easily identify the boundary of NALU. Of course, there is a problem with the start code positioning the NALU boundary, that is, the NALU may exist the same data as the start code. In order to prevent this problem, when building NALU, 0x000000, 0x000001, 0x000002, 0x000003 in the data is required to enter the anti-competition byte (emulation prevention "0x03, making it: 0x000000 = 0x000000 03 00 0x000001 = 0x0000 03 01 03 02 0X000003 = 0x0000 03 03 Decoder Abandon 0x03 when detecting 0x000003, restore the original data. Since each NALU contains an initial code for each NALU, the decoder can start decoding from the video stream random point, often used in real-time flow formats. SPS and PPS are typically repeated in this format, and often at each keyframe. AVCC (that is, our AVC) The AVCC format does not use the start code as the boundary of NALU, which adds a prefix that specifies the large end format of the NALU length before each NALU. This prefix can be 1, 2 or 4 bytes, so when you analyze the AVCC format, you need to save the value of the specified prefix by the value of the specified prefix, which is often referred to as extraData or sequence.e header。 Meanwhile, SPS and PPS data also need to be saved in extradata. H. 264 extradata syntax is as follows: bitsline by byteremark8versionalways0x018avc profilesps[0][1]8avc compatibilitysps[0][2]8avc levelsps[0][3]6reservedall bits on2NALULengthSizeMinusOne3reservedall bits on5number of SPS NALUs usually116SPS sizeNvariable SPS NALU data8number of PPS NALUs usually116PPS sizeNvariable PPS NALU data The last two bits of the 5th byte represent the number of bytes of nal size. It should be noted that the nallengthsizeminusone is the length of the Nalu prefix minus one, that is, assuming that the prefix length is 4, the value should be 3. It should also be noted that although the avcc format does not use the start code, there are anti contention bytes. One advantage of avcc format is that the decoder configuration parameters are configured at the beginning. The system can easily identify the boundary of Nalu without additional start code, which reduces the waste of resources. At the same time, it can be adjusted to the middle position of the video during playback. This format is usually used for multimedia data that can be accessed randomly, such as files stored on the hard disk. Do you understand that our buffer [9] = 0xff, why 0xff? The reason is what we see in the document, The first six bits are 111111, and the last two bits represent the length of the front header, that is, when H.264 is converted into a media stream in the corresponding format of FLV that can be transmitted by network RTMP, the original Nalu start bit 00 00 01 is not used to represent an Nalu, but a fixed four bytes are used to represent an Nalu, and these four bytes, In the representation of the position of nallengthsizeminusone, 0x04 cannot be written, but 0x04-1 = 0x03, and the binary of 0x03 is 11, so the complete representation of nallengthsizeminusone is binary (1111, 1111) and hexadecimal is 0xff. This is a rule, so it is used to represent the starting position of Nalu here, so buffer [9] = (0xff) Next, numofsequenceparametersets, that is, the number of SPS. We often obtain only one SPS, so the number is 1, while the first three bits of the high order are reserved 111, so it is 1110 0001, that is, 0xe1, so buffer [10] = (0xe1) The next is two bytes (SPS length). According to our previous data, the original first frame does not contain 23 bytes of H.264 start bit, so it is less than 255, then buffer [11] = 0x00, buffer [12] = 0x17 (decimal 23) Then add 23 bytes of SPS data to the end of the data, so now the buffer length is 13 + 23 = 36 bytes, That is, buffer [13] ~ buffer [35] = 67 42 C0 33 A6 81 E0 51 A1 00 03 00 01 00 03 00 32 8F 18 32 A0 The subscript of the next data is 36. Then numofpictureparametersets, that is, the number of PPS, where buffer [36] = 0x01 Then there are two bytes (the length of PPS). According to our previous data, our original second frame does not contain 4 bytes of H.264 start bit, so if it is less than 255, then buffer [37] = 0x00 and buffer [38] = 0x04. Then add 4 bytes of PPS data to the end of this data, so now the buffer length is 39 + 4 = 43 bytes, That is, buffer [39] ~ buffer [42] = 68 CE 1F 20 Finally, the RTMP header data transmitted is buffer[0]=0x17buffer[1]=0x00buffer[2]=0x00buffer[3]=0x00buffer[4]=0x00buffer[5]=0x01buffer[6]=0x42buffer[7]=0xc0buffer[8]=0x33buffer[9]=0xffbuffer[10]=0xe1buffer[11]=0x00buffer[12]=0x17buffer[13]=0x67buffer[14]=0x42buffer[15]=0xc0buffer[16]=0x33buffer[17]=0xa6buffer[18]=0x81buffer[19]=0xe0buffer[20]=0x51buffer[21]=0xa1buffer[22]=0x00buffer[23]=0x0 0buffer[24]=0x03buffer[25]=0x00buffer[26]=0x01buffer[27]=0x00buffer[28]=0x00buffer[29]=0x03buffer[30]=0x00buffer[31]=0x32buffer[32]=0x8fbuffer[33]=0x18buffer[34]=0x32buffer[35]=0xa0buffer[36]=0x01buffer[37]=0x00buffer[38]=0x04buffer[39]=0x68buffer[40]=0xcebuffer[41]=0x1fbuffer[42]=0x20 So far, we have finished the introduction of the start frame of RTMP transmission H.264, and then we will call and send“