Introduction to H.264 video codec technology in H.323 system

"I. Introduction In recent years, with the rapid construction of communication network infrastructure in China, video service has developed rapidly and is expected to become the main business of NGN because it can provide audio and video and other information for participants at multiple points, save a lot of expenses and improve work efficiency. With the rise of HDTV, H.264 standard appears frequently in front of us. HD-DVD and Blu ray plan to use this standard for program production. Moreover, since the second half of 2005, both NVIDIA and ATI have regarded supporting H.264 hardware decoding acceleration as their most commendable video technology. H. 264 where is "sacred"? H. 264 is a high-performance video codec technology. At present, there are two international organizations that formulate video coding and decoding technology. One is ITU-T, which formulates standards such as H.261, H.263 and H.263 +, and the other is the international organization for Standardization (ISO), which formulates standards such as MPEG-1, MPEG-2 and MPEG-4. H.264 is a new digital video coding standard jointly formulated by the joint video group (JVT) jointly established by the two organizations, so it is not only H.264 of ITU-T, but also MPEG-4 advanced video coding (AVC) of ISO / IEC, and it will become part 10 of MPEG-4 standard. Therefore, whether MPEG-4 AVC, MPEG-4 Part 10 or ISO / IEC 14496-10, it refers to H.264. H. 323 system puts forward the following three main requirements for video codec standard: (1) Some IP network access methods, such as xDSL, can provide limited bandwidth. In addition to the bandwidth occupied by audio and data, the available bandwidth for video transmission is less, which requires high video codec compression rate, so that it can have better image quality at a certain bit rate. (2) It has good anti packet loss performance and anti bit error performance, and can adapt to various network environments, including wireless networks with serious packet loss and bit error. (3) It has good network adaptability and is convenient for video stream transmission in the network. 2、 Three technical advantages of H.264 for H.323 system H. 264 fully considers the various requirements of multimedia communication for video coding and decoding, and draws lessons from the research results of previous video standards, so it has obvious advantages. The following describes the three advantages of H.264 in combination with the requirements of H.323 system for video coding and decoding technology. 1. Compression ratio and image quality Compression ratio, a noun that describes the effect of compressing a file, is the ratio of the size of the file after compression to the size before compression. For example, if you compress a 100m file to 90m, the compression rate is 90 / 100 * 100% = 90%. Generally, the smaller the compression rate, the better, but the smaller the compression, the longer the time. The improvement of traditional algorithms such as intra prediction, inter prediction, transform coding and entropy coding further improves the coding efficiency and image quality of H.264 on the basis of previous standards. (1) Variable block size: the block size can be flexibly selected during inter prediction. In macroblock (MB) partition, H.264 adopts 16 × 16，16 × 8，8 × 16，8 × 8. Four modes; When divided into 8 × 8 mode, 8 can be further used × 4、4 × 8、4 × 4 the three sub macroblock division mode is further divided, which can make the division of moving objects more accurate, reduce the prediction error and improve the coding efficiency. (2) High precision motion estimation: in H.264, the accuracy of brightness signal motion compensation prediction is 1 / 4 pixel. If the motion vector points to the whole pixel position of the reference image, the predicted value is the value of the reference image pixel at that position; Otherwise, the predicted value of 1 / 2 pixel position is obtained by linear interpolation of 6-order FIR filter, and the value of 1 / 4 pixel position is obtained by taking the integer and the mean value of 1 / 2 pixel position pixel value. Obviously, using high-precision motion estimation will further reduce the inter prediction error. (3) Multi reference frame motion estimation: each m × The motion vectors and reference image indexes of N luminance blocks are obtained by motion compensation prediction. Each sub macroblock partition in the sub macroblock will have different motion vectors. The process of selecting the reference image is carried out at the sub macroblock level. Therefore, multiple sub macroblocks in a sub macroblock use the same reference image in prediction, and the reference images selected between multiple sub macroblocks of the same slice can be different, which is multi reference frame motion estimation. (4) The selection of reference image is more flexible: the reference image can even be an image using bidirectional prediction coding, which allows to select an image more matching with the current image as the reference image for prediction, so as to reduce the prediction error. (5) Weighted prediction: the encoder is allowed to weight the motion compensation prediction value with a certain coefficient, so as to improve the image quality in a certain scene. (6) Elimination of block effect filter in motion compensation cycle: in order to eliminate the block effect introduced in the process of prediction and transformation, H.264 also adopts the elimination of block effect filter, but the difference is that the elimination of block effect filter of H.264 is located inside the motion estimation cycle, so the image after elimination of block effect can be used to predict the motion of other images, So as to further improve the prediction accuracy. 2. Anti packet loss and anti bit error The use of parameter set, chip, FMO, redundant chip and other key technologies can greatly improve the anti packet loss and anti bit error performance of the system. (1) Parameter set: the parameter set and its flexible transmission mode will greatly reduce the possibility of errors caused by the loss of key header information. In order to ensure that the parameter set reaches the decoder reliably, the same parameter set can be sent multiple times by retransmission, or multiple parameter sets can be transmitted. (2) Use of slice: the image can be divided into one or several slices. When the image is divided into multiple slices, the spatial visual impact will be greatly reduced when a slice cannot be decoded normally, and the slice also provides a resynchronization point. (3) PAFF and MBAFF: when encoding interlaced images, due to the large scanning interval between the two fields, the spatial correlation of the two adjacent lines in the frame will be reduced compared with progressive scanning for moving images. At this time, encoding the two fields separately will save the code stream. The first two are called PAFF coding. When coding the moving area, the field mode is effective, and the frame mode will be more effective in the non moving area because of the large correlation between the two adjacent lines. When there are both moving and non moving regions in the image, it is more effective to adopt the field mode for the moving region and the frame mode for the non moving region at the MB level. This method is called MBAFF. (4) FMO: the error recovery capability of the chip can be further improved through FMO. Through the use of slice group, FMO changes the way in which images are divided into slices and macroblocks. Macroblock to slice group mapping defines which slice group the macroblock belongs to. Using FMO technology, H.264 defines seven macroblock scanning modes. (1) Intra prediction: H.264 draws on the experience of previous video codec standards in intra prediction. It is worth noting that in H.264, IDR image can invalidate the reference image cache, and subsequent images will no longer refer to the image before IDR image during decoding. Therefore, IDR image has a good resynchronization effect. In some channels with serious packet loss and bit error, the way of irregular transmission of IDR images can be adopted to further improve the anti error and anti packet loss performance of H.264. (2) Redundant image: in order to improve the robustness of H.264 decoder in case of data loss, the way of transmitting redundant image can be adopted. When the basic image is lost, the original image can be reconstructed through redundant images. (3) Data partition: because the information such as motion vector and macroblock type is more important than other information, the concept of data partition is introduced in H.264 to put the syntax elements related to each other in the slice into the same partition. In H.264, there are three different types of data division. The three types of data division are transmitted separately. If the information divided by the second or third type is lost, the lost information can still be properly recovered through the information in the first type of division using the error recovery tool. (4) Multi reference frame motion estimation: multi reference frame motion estimation can not only improve the coding efficiency of the encoder, but also improve the error recovery ability. In H.323 system, by using RTCP, when the encoder knows that a reference image is lost, it can select the image correctly received by the decoder as the reference image. (5) In order to prevent the error from spreading in space, the decoder can specify that the adjacent non intra coded macroblocks are not used as a reference when the macroblocks in chip P or chip B are making intra prediction. 3. Network adaptability In order to adapt to various network environments and applications, H.264 defines video coding layer (VCL) and network extraction layer (NAL). The VCL function is video coding and decoding, including motion compensation prediction, transform coding and entropy coding; Nal is used to package VCL video data in an appropriate format. (1) NAL units: video data is encapsulated in an integer byte Nalu, and its first byte marks the type of data in the unit. Network based on packet switching (such as H.323 system) can encapsulate Nalu using RTP encapsulation format. Other systems may require that Nalu be transmitted as a sequential bit stream. Therefore, H.264 defines a transmission mechanism of bit stream format, using start_ code_ Prefix encapsulates the Nalu to determine the nal boundary. (2) Parameter set: in previous video coding and decoding standards, header information such as gobgop image is very important. The loss of packets containing these information often leads to the failure of decoding the images related to these information. For this purpose, H.264 transmits these little changed information that works on a large number of VCL nalus in the parameter set. There are two kinds of parameter sets: sequence parameter set and image parameter set. In order to adapt to a variety of network environments, parameter sets can be transmitted in band or out of band. 3、 Implementation of H.264 in H.323 system H. 323 is a part of ITU multimedia communication series standard h.32x, which makes it possible to carry out video conference on the existing communication network, H. 320 is the standard for multimedia communication on N-ISDN; h.321 is the standard for multimedia communication on B-ISDN; H.322 is the standard for multimedia communication on LAN with quality of service assurance; H.324 is the standard for multimedia communication on GSTN and wireless network. H. 323 provides multimedia communication standard for existing packet network PBN (such as IP network). If combined with other IP technologies such as IETF Resource Reservation Protocol RSVP, multimedia communication in IP network can be realized. IP based LAN is becoming more and more powerful, such as IP over SDH / SONET, IP over ATM technology is developing rapidly, and LAN broadband is constantly improving. Because it can provide interoperability between devices, applications, and suppliers, H.323 can ensure the interoperability of all H.323 compatible devices. Higher speed processors, increasingly enhanced graphics devices and powerful multimedia acceleration chips make PC a more and more powerful multimedia platform. As H.264 is a new video codec standard, there are some problems in the application of H.264 in H.323 system, such as how to define the entity's H.264 capability in the process of H.245 capability negotiation. Therefore, the H.323 standard must be supplemented and modified. Therefore, ITU-T has formulated h.241 standard. This article only introduces the modifications related to H.323. First, specify how to define the H.264 capability in the H.245 capability negotiation process. H. The H.264 capability set is a list containing one or more H.264 capabilities. Each H.264 capability contains two required parameters: profile and level, and several optional parameters such as custommaxmbps and custommaxfs. In H.264, profile is used to define the coding tools and algorithms for generating bitstreams, and level is required for some key parameters. The first entry in the collapsing field is profile, the parameteridentifier type is standard, and the value is 41, which is used to identify the profile. The parametervalue type is Boolean array, and its value identifies the profile, which can be 64, 32 or 16. These three values represent the baseline, main and extended profiles in turn; The second entry in the collapsing field is level. The parameteridentifier type is standard and the value is 42. It is used to identify level. The parametervalue type is unsigned min, and its value identifies 15 optional level values defined in H.264 AnnexA. Several other parameters appear as options. H. 323 can provide an interconnection standard for multimedia communication between PBN and other networks. Many computer and network communication companies, such as Intel, Microsoft and Netscape, support the H.323 standard. H. 323 standard includes the technical requirements for multimedia communication in packet networks without QoS guarantee. These packet networks include LAN, Wan, Internet / Internet and dial-up connection or point-to-point connection through GSTN or ISDN using packet protocols such as PPP 4、 Conclusion As a new international standard, H.264 has achieved success in coding efficiency, image quality, network adaptability and error resistance. However, with the rapid development of terminals and networks, the requirements for video coding and decoding are increasing. Therefore, H.264 is still improving and developing to meet the new requirements. At present, the research on H.264 mainly focuses on how to further reduce the encoding and decoding delay, optimize the algorithm and further improve the image quality., Read the full text“