"H.264 is an open standard that requires a license to support the most efficient video compression technology in the market today H. 264 is the latest video compression standard, also known as mpeg-4part10 or AVC (advanced video coding). It is predicted that H.264 will become the preferred video standard in the industry in the next few years. Without affecting the image quality, H.264 encoder can reduce the size of digital video files by more than 80% and 50% respectively compared with M-JPEG and mpeg-4part2 standards. This means that the network bandwidth and storage space required for video files will be greatly reduced. Or from another point of view, the video image quality will be significantly improved at a specific bit rate.
H. H.264 is jointly formulated by standardization organizations in the telecommunications and it industries. Compared with previous standards, H.264 is expected to be more widely used.
H. 264 has been applied to the new generation of electronic products such as mobile phones and digital video players, and has quickly won the favor of the majority of end users. Service providers such as online video storage companies and telecommunications companies have also begun to adopt the H.264 standard.
In the video surveillance industry, H.264 is likely to be applied at the fastest speed to those surveillance places that need high frame rate and high resolution, such as highways, airports and entertainment places. For these monitoring sites, the frame rate of 30 / 25FPS (NTSC / PAL) has become a common standard. However, because H.264 can reduce bandwidth and storage requirements and has significant economy, it can help enterprises save costs to the greatest extent.
In addition, because H.264, an extremely efficient compression technology, can compress large files and reduce the bit rate without affecting the image quality, it is expected to improve the popularity of megapixel cameras. However, everything has two sides. Although H.264 can save network bandwidth and storage cost, it puts forward higher performance requirements for network camera and display terminal.
2. Development of H.264
H. 264 is a new generation of video compression standard jointly formulated by ITU-T video coding expert group (VCEG) and ISO / IEC moving image expert group (MPEG). ITU-T is a department that coordinates the formulation of Telecommunications Standards on behalf of the International Telecommunication Union. ISO means the international organization for standardization. IEC refers to the International Electrotechnical Commission, which is responsible for formulating standards for all electronic, electrical and related technologies. H. 264 is the name used by ITU-T, and ISO / IEC named it mpeg-4part10 / AVC because it represents a new standard in the MPEG-4 series of standards. MPEG-4 series standards include mpeg-4part2 and other standards. Mpeg-4part2 is a standard applied to IP based video encoder and webcam.
In order to solve the shortcomings of previous video compression standards, the goal of H.264 is to support:
Efficient compression can reduce the bit rate by 50% on average compared with any other video standard under a specific video quality.
More powerful fault tolerance, which can correct the transmission errors of various networks
Low delay function, and can provide higher quality images with higher delay
Simplify implementation through simple syntactic norms
Precise matching decoding strictly specifies how the encoder and decoder perform numerical calculation to avoid error accumulation
In addition, H.264 can flexibly support various monitoring applications with different bit rate requirements. For example, in entertainment video applications (including radio, satellite TV, cable TV and DVD), H.264 can achieve performance of 1-10mbit / s with high delay. For telecommunication services, H.264 can achieve a bit rate of less than 1mbit / s with low delay.
3. Working principle of video compression
Video compression can effectively send and store digital video files by reducing and removing redundant video data. In the process of compression, it is necessary to apply compression algorithm to compress the source video to create compressed files for transmission and storage. To play the compressed file, you need to apply the opposite decompression algorithm to restore the video. The restored video content is almost the same as the original source video content. The time required to compress, send, decompress, and display files is called latency. Under the same processing capacity, the higher the compression algorithm, the longer the delay.
Video codec (encoder / decoder) refers to two cooperative compression decompression algorithms. Video codecs using different standards are usually incompatible with each other; That is, video content compressed using one standard cannot be decompressed using another standard. For example, the mpeg-4part2 decoder cannot work with the H.264 encoder. This is because one algorithm cannot correctly decode the output signal of another algorithm. However, we can use a variety of different algorithms in the same software or hardware to support the compression of files in multiple formats.
Because different video compression standards use different methods to reduce the amount of data, the compression results are also different in bit rate, quality and delay.
In addition, since the designer of the encoder may choose to use different toolsets defined by a certain standard, the compression results may be different even between encoders using the same compression standard. However, as long as the output signal of the encoder meets the standard format and the requirements of the decoder, different embodiments can be adopted. This is very advantageous because different implementation methods can achieve different objectives and meet different budget requirements. For the non real-time professional software encoder used to manage optical media storage, it should be able to provide higher quality encoded video than the real-time hardware encoder integrated in handheld devices for video conference. Therefore, even a specified standard cannot guarantee the provision of a specified bit rate or quality. Moreover, if the embodiment is not determined in advance, a standard cannot be correctly compared with other standards, or even with other embodiments of the same standard.
Different from the encoder, the decoder must implement all the necessary parts of a standard in order to decode the bitstream conforming to the standard. This is because it is clearly stipulated in the standard to understand how the compression algorithm should restore each bit of the compressed video.
The following figure shows the bit rate comparison of the following video standards under the same image quality level: M-JPEG, mpeg-4part2 (without motion compensation), mpeg-4part2 (with motion compensation) and H.264 (reference class).
Fig. 1. For video sequence samples, using H.264 encoder can reduce the bit rate (BPS) by 50% than using MPEG-4 encoder with motion compensation. Without motion compensation, the efficiency of H.264 encoder is at least 3 times higher than MPEG-4 encoder and 6 times higher than M-JPEG encoder. 4.h.264 category and grade
The joint organizations involved in the development of H.264 standard are committed to creating a simple and clear solution to maximize the number of options and features. Like other video standards, an important aspect of H.264 standard is to support common applications and common formats in the best way through the functions provided in category (algorithm feature set) and level (performance level).
H. 264 has seven categories, each for a specific class of applications. In addition, each category defines which feature sets can be used by the encoder and limits the implementation complexity of the decoder.
Network cameras and video encoders are most likely to use the benchmark category, which is mainly for applications with limited computing resources. For the Real-time Encoder embedded in network video products, the benchmark category is the most suitable under the specific available performance. This category can achieve low delay, which is a very important requirement for surveillance video, and it is particularly important to support PTZ network camera to realize real-time pan / tilt / Zoom (PTZ) control.
H. 264 is divided into 11 functional levels, which limits the performance, bandwidth and memory requirements. Each level specifies the bit rate and coding rate (number of macroblocks per second) corresponding to various resolutions from QCIF to HDTV. The higher the resolution, the higher the level required.
5. Basic knowledge of frame
According to different categories of H.264, the encoder will use different types of frames, such as I frame, P frame and B frame.
I frame (intra coded frame) is an independent frame with all information. It can be decoded independently without referring to other images. The first frame in a video sequence is always an I frame. If the transmitted bitstream is corrupted, the I frame needs to be used as the starting point or resynchronization point of the new viewer. I frame can be used to realize fast forward, fast backward and other random access functions. If the new client will participate in viewing the video stream, the encoder will automatically insert I frames at the same time interval or as required. The disadvantage of I frames is that they occupy more data bits, but on the other hand, I frames do not produce perceptible blur.
P frame (inter prediction coding frame) needs to refer to the previous I frame and / or different parts of P frame for coding. Compared with I frame, P frame usually occupies fewer data bits, but its disadvantage is that P frame is very sensitive to transmission errors because it has complex dependence on the previous P and I reference frames.
B frame (bidirectional prediction coding frame) needs to take the previous frame and the subsequent frame as the reference frame at the same time.
Figure 2. Typical video sequences with I frames, B frames, and P frames. P frame only needs to refer to the previous I frame or P frame, while B frame needs to refer to the previous and subsequent I frame or P frame at the same time.
When the video decoder decodes the bitstream frame by frame in order to reconstruct the video, it must always start decoding from the I frame. If P and B frames are used, they must be decoded together with the reference frame.
In the H.264 reference class, only I frames and P frames are used. Since the reference class does not use B frame, it can achieve low delay, so it is an ideal choice for network camera and video encoder.
6. Basic methods to reduce the amount of data
The amount of video data can be reduced within an image frame or between a series of frames by various methods.
In an image frame, only unnecessary information needs to be deleted to reduce the amount of data, but doing so will reduce the resolution of the image.
In a series of frames, the amount of video data can be reduced by differential coding, which is adopted by most video compression standards, including H.264. In differential coding, a frame is compared with the reference frame (i.e. the previous I frame or P frame), and then only those pixels that have changed relative to the reference frame are encoded. In this way, the pixel values to be encoded and transmitted can be reduced.
Fig. 3. For the M-JPEG format, the three images in the above sequence are encoded and transmitted as independent images (I frames), which are independent of each other.
Fig. 4. For differential coding (most video compression standards including H.264 adopt this method), only the first image (I frame) encodes the full frame image information.
If the difference is detected and differentially encoded according to a pixel block (macroblock) rather than a single pixel, the amount of information to be encoded can be further reduced; Therefore, larger regions can be compared, and only those blocks with significant differences need to be encoded. In addition, the overhead associated with marking the location of the changed area will be greatly reduced.
However, if there is a large amount of object motion in the video, differential coding will not significantly reduce the amount of data. At this time, block based motion compensation technology can be used. Block based motion compensation considers that a large amount of information constituting a new frame in the video sequence can be found in the previous frame, but may be in different positions. Therefore, this technique divides a frame into a series of macroblocks. Then, a new frame (E. G., P frame) is constructed or "predicted" block by block by finding matching blocks in the reference frame. If a matching block is found, the encoder only needs to encode the position where the matching block is found in the reference frame. Compared with encoding the actual content of the block, encoding only the motion vector can reduce the occupied data bits.
7. Efficiency of H.264
H. 264 promotes video compression technology to a new height. In H.264, I frames will be encoded by a new advanced intra prediction method. By continuously predicting the smaller pixel blocks in each macroblock in the frame, this method can greatly reduce the data bits occupied by I frame and maintain high quality. This can be achieved through a new method 4 for intra coding with × Among the first few coding pixels adjacent to the 4-pixel block, find the matching pixels to achieve. By reusing the encoded pixel values, the number of bits to be encoded can be greatly reduced. The new intra prediction function is the key part of H.264 technology. Experiments show that this method is very effective. Compared with the M-JPEG video stream using only I frame, the file size of H.264 video stream using only I frame is much smaller.
Efficiency diagram of measurement method. In this way, the intra prediction image will be sent "free". The output image can be generated by encoding the residual image and the intra prediction mode.
The block based motion compensation used in encoding P frame and B frame has also been improved in H.264. H. The 264 encoder can search for matching blocks with sub-pixel accuracy in a few or many areas of one or more reference frames. In order to improve the matching rate, the size and shape of the block can be adjusted. In the reference frame, for the area where no matching block can be found, the macroblock encoded within the frame will be used. H. 264 block based motion compensation has high flexibility and is very suitable for crowded monitoring places, because it can ensure high quality to meet strict application requirements. Motion compensation is one of the most stringent requirements of video encoder. The different ways and degrees of motion compensation implemented by H.264 encoder will affect the efficiency of video compression.
For H.264, by using an in-loop deblocking effect filter, the image blur phenomenon that usually occurs in highly compressed video using M-JPEG and MPEG Standards (rather than H.264 standards) can be reduced. This filter can smooth the block edge through adaptive strength, so as to ensure the output of almost perfect decompressed video.
Figure 7. From the figure on the right, we can see that after the deblocking effect filter is applied, the blocking effect of the highly compressed image in the figure on the left has been greatly reduced.
H. 264 represents a great leap in video compression technology. Because this technology has more accurate prediction ability and higher fault tolerance, it can achieve higher accuracy
Our other product: