"This paper focuses on the H.323 system suitable for providing multimedia services on IP network. H. 264 is a new video codec standard proposed by JVT to achieve higher compression ratio, better image quality and good network adaptability. Facts have proved that H.264 coding saves more code stream. Its inherent anti packet loss, anti error code ability and good network adaptability make it very suitable for IP transmission. H.264 is expected to become the preferred video standard in H.323 system.
H. 323 system puts forward the following three main requirements for video codec standard:
(1) Some IP network access methods, such as xDSL, can provide limited bandwidth. In addition to the bandwidth occupied by audio and data, the available bandwidth for video transmission is less, which requires high video codec compression rate, so that it can have better image quality at a certain bit rate.
(2) It has good anti packet loss performance and anti bit error performance, and can adapt to various network environments, including wireless networks with serious packet loss and bit error.
(3) It has good network adaptability and is convenient for video stream transmission in the network.
2、 Three technical advantages of H.264 for H.323 system
H. 264 fully considers the various requirements of multimedia communication for video coding and decoding, and draws lessons from the research results of previous video standards, so it has obvious advantages. The following describes the three advantages of H.264 in combination with the requirements of H.323 system for video coding and decoding technology.
1. Compression ratio and image quality
The improvement of traditional algorithms such as intra prediction, inter prediction, transform coding and entropy coding further improves the coding efficiency and image quality of H.264 on the basis of previous standards.
(1) Variable block size: the block size can be flexibly selected during inter prediction. In the macroblock (MB) partition, H.264 adopts 16 & times; 16,16&TImes; 8,8&TImes; 16,8&TImes; 8. Four modes; When divided into 8 × 8 mode, 8 can be further used × 4、4 × 8、4 × 4 the three sub macroblock division mode is further divided, which can make the division of moving objects more accurate, reduce the prediction error and improve the coding efficiency. Intra prediction generally adopts two brightness prediction modes: intra_ four × 4 and Intranet_ sixteen × 16。 Intra_ four × 4. It is suitable for areas with rich details in the image, while Intranet_ sixteen × 16 mode is more suitable for rough image areas.
(2) High precision motion estimation: in H.264, the accuracy of brightness signal motion compensation prediction is 1 / 4 pixel. If the motion vector points to the whole pixel position of the reference image, the predicted value is the value of the reference image pixel at that position; Otherwise, the predicted value of 1 / 2 pixel position is obtained by linear interpolation of 6-order FIR filter, and the value of 1 / 4 pixel position is obtained by taking the integer and the mean value of 1 / 2 pixel position pixel value. Obviously, using high-precision motion estimation will further reduce the inter prediction error.
(3) Multi reference frame motion estimation: each m × The motion vectors and reference image indexes of N luminance blocks are obtained by motion compensation prediction. Each sub macroblock partition in the sub macroblock will have different motion vectors. The process of selecting the reference image is carried out at the sub macroblock level. Therefore, multiple sub macroblocks in a sub macroblock use the same reference image in prediction, and the reference images selected between multiple sub macroblocks of the same slice can be different, which is multi reference frame motion estimation.
(4) The selection of reference image is more flexible: the reference image can even be an image using bidirectional prediction coding, which allows to select an image more matching with the current image as the reference image for prediction, so as to reduce the prediction error.
(5) Weighted prediction: the encoder is allowed to weight the motion compensation prediction value with a certain coefficient, so as to improve the image quality in a certain scene.
(6) Elimination of block effect filter in motion compensation cycle: in order to eliminate the block effect introduced in the process of prediction and transformation, H.264 also adopts the elimination of block effect filter, but the difference is that the elimination of block effect filter of H.264 is located inside the motion estimation cycle, so the image after elimination of block effect can be used to predict the motion of other images, So as to further improve the prediction accuracy.
2. Anti packet loss and anti bit error
The use of parameter set, chip, FMO, redundant chip and other key technologies can greatly improve the anti packet loss and anti bit error performance of the system.
(1) Parameter set: the parameter set and its flexible transmission mode will greatly reduce the possibility of errors caused by the loss of key header information. In order to ensure that the parameter set reaches the decoder reliably, the same parameter set can be sent multiple times by retransmission, or multiple parameter sets can be transmitted.
(2) Use of slice: the image can be divided into one or several slices. When the image is divided into multiple slices, the spatial visual impact will be greatly reduced when a slice cannot be decoded normally, and the slice also provides a resynchronization point.
(3) PAFF and MBAFF: when encoding interlaced images, due to the large scanning interval between the two fields, the spatial correlation of the two adjacent lines in the frame will be reduced compared with progressive scanning for moving images. At this time, encoding the two fields separately will save the code stream. For a frame, there are three optional coding methods, which combine the two fields as one frame, encode the two fields separately, or combine the two fields as one frame, but the difference is to combine the two vertically adjacent macroblocks in the frame into a macroblock pair for coding. The first two are called PAFF coding. When coding the moving area, the field mode is effective, and the frame mode will be more effective in the non moving area because of the large correlation between the two adjacent lines. When there are both moving and non moving regions in the image, it is more effective to adopt the field mode for the moving region and the frame mode for the non moving region at the MB level. This method is called MBAFF.
(4) FMO: the error recovery capability of the chip can be further improved through FMO. Through the use of slice group, FMO changes the way in which images are divided into slices and macroblocks. Macroblock to slice group mapping defines which slice group the macroblock belongs to. Using FMO technology, H.264 defines seven macroblock scanning modes.
(1) Intra prediction: H.264 draws on the experience of previous video codec standards in intra prediction. It is worth noting that in H.264, IDR image can invalidate the reference image cache, and subsequent images will no longer refer to the image before IDR image during decoding. Therefore, IDR image has a good resynchronization effect. In some channels with serious packet loss and bit error, the way of irregular transmission of IDR images can be adopted to further improve the anti error and anti packet loss performance of H.264.
(2) Redundant image: in order to improve the robustness of H.264 decoder in case of data loss, the way of transmitting redundant image can be adopted. When the basic image is lost, the original image can be reconstructed through redundant images.
(3) Data partition: because the information such as motion vector and macroblock type is more important than other information, the concept of data partition is introduced in H.264 to put the syntax elements related to each other in the slice into the same partition. In H.264, there are three different types of data division. The three types of data division are transmitted separately. If the information divided by the second or third type is lost, the lost information can still be properly recovered through the information in the first type of division using the error recovery tool.
(4) Multi reference frame motion estimation: multi reference frame motion estimation can not only improve the coding efficiency of the encoder, but also improve the error recovery ability. In H.323 system, by using RTCP, when the encoder knows that a reference image is lost, it can select the image correctly received by the decoder as the reference image.
(5) In order to prevent the error from spreading in space, the decoder can specify that the adjacent non intra coded macroblocks are not used as a reference when the macroblocks in chip P or chip B are making intra prediction.
3. Network adaptability
In order to adapt to various network environments and applications, H.264 defines video coding layer (VCL) and network extraction layer (NAL). The VCL function is video coding and decoding, including motion compensation prediction, transform coding and entropy coding; Nal is used to package VCL video data in an appropriate format.
(1) NAL units: video data is encapsulated in an integer byte Nalu, and its first byte marks the type of data in the unit. H. 264 defines two packaging formats. Network based on packet switching (such as H.323 system) can encapsulate Nalu using RTP encapsulation format. Other systems may require that Nalu be transmitted as a sequential bit stream. Therefore, H.264 defines a transmission mechanism of bit stream format, using start_ code_ Prefix encapsulates the Nalu to determine the nal boundary.
(2) Parameter set: in previous video coding and decoding standards, header information such as gobgop image is very important. The loss of packets containing these information often leads to the failure of decoding the images related to these information. For this purpose, H.264 transmits these little changed information that works on a large number of VCL nalus in the parameter set. There are two kinds of parameter sets: sequence parameter set and image parameter set. In order to adapt to a variety of network environments, parameter sets can be transmitted in band or out of band.
3、 Implementation of H.264 in H.323 system
As H.264 is a new video codec standard, there are some problems in the application of H.264 in H.323 system, such as how to define the entity's H.264 capability in the process of H.245 capability negotiation. Therefore, the H.323 standard must be supplemented and modified. Therefore, ITU-T has formulated h.241 standard. This article only introduces the modifications related to H.323.
First, specify how to define the H.264 capability in the H.245 capability negotiation process. H. The H.264 capability set is a list containing one or more H.264 capabilities. Each H.264 capability contains two required parameters: profile and level, and several optional parameters such as custommaxmbps and custommaxfs. In H.264, profile is used to define the coding tools and algorithms for generating bitstreams, and level is required for some key parameters. H. The H.264 capability is contained in the genericcapability structure, where the type of capability identifier is standard and the value is 0.0.8.241.0.0.1, which is used to identify the H.264 capability. Maxbitrate is used to define the maximum bit rate. The collapsing field contains H.264 capability parameters. The first entry in the collapsing field is profile, the parameteridentifier type is standard, and the value is 41, which is used to identify the profile. The parametervalue type is Boolean array, and its value identifies the profile, which can be 64, 32 or 16. These three values represent the baseline, main and extended profiles in turn; The second entry in the collapsing field is level. The parameteridentifier type is standard and the value is 42. It is used to identify level. The parametervalue type is unsigned min, and its value identifies 15 optional level values defined in H.264 AnnexA. Several other parameters appear as options
Secondly, because the image organization structure in H.264 is different from the traditional standard, some original H.245 signaling is no longer applicable to H.264, such as videofastupdategob in miscellaneous command. Therefore, h.241 redefines several signaling to provide corresponding functions.
Finally, the RTP package of H.264 refers to RFC 3550, and the load type (PT) field is not specified.
4、 Conclusion
As a new international standard, H.264 has achieved success in coding efficiency, image quality, network adaptability and error resistance. However, with the rapid development of terminals and networks, the requirements for video coding and decoding are increasing. Therefore, H.264 is still improving and developing to meet the new requirements. Now the research on H.264 mainly focuses on how to further reduce the encoding and decoding delay, algorithm optimization and further improve the image quality. At present, there are more and more video conference systems using H.264 for encoding and decoding, and most of them have achieved interoperability on the baseline profile. With the continuous improvement of H.264 itself and the continuous popularization of video communication, it is believed that the application of H.264 will be more and more widely., Read the full text, technical section
Tech supports Amazon (AWS) media services to provide quality assurance for end-to-end video
IMEC is about to show its first short wave infrared (SWIR) band hyperspectral imaging camera
4K Ultra HD home theater projector brings HD experience to participate in the grand event
Design of video display system based on Unified Computing Architecture Technology
Apple TV 4K disassembly report: familiar modular components“
Our other product: