Tencent wireless destination is a meeting product that Tencent audio and video laboratory to solve the conference product developed by high frequency fields in conference room. The wireless destination has improved the efficiency of conference and simplifies the conference process. Since the online, it has been widely used. The wireless destination involves the technical broad technology, the technical difficulty, in order to enhance the user experience, Tencent wireless destination is a large number of attacks in network adaptability, mouse optimization, expansion screen and video codec technology, etc., so that our products are in various aspects. It is in the industry leader. This article will reveal the screen coding technique behind Tencent wireless destinations (Tsencing TSE), for screen content images, TSE compared to X265 (Normal mode), compression efficiency by 55%.
1. Introduction to the screen content
The screen content image is captured directly from the image display units of various devices (computers, mobile terminals, etc.). Common screens include computer graphics and text images, natural videos and graphics / text mixed images, and computer generated animation images. The screen image is generally existed in desktop collaboration, desktop sharing, second screen, cloud game and other scenarios.
Figure 1 Typical screen content image
There is a significant difference between the computer generated screen image and the natural image captured by the camera - graphic text This computer generated image is usually no noise, the tone is discrete, the line is delicate, the edge sharp; the camera shooting video is usually noise, hue Continuous, texture is more complicated.
The mixed coding structure adopted by traditional video coding is not very good for screen content. For a relatively fine high frequency texture, it will lose detail after encoding, and it is easy to generate ringing effect, as shown in the figure below:
Figure 2 screen content uses a mixed encoded effect diagram
Since the screen image has a significant difference from traditional video, we need to seek new coding tools, fully exploit the feature of the screen image itself, to greatly improve the coding efficiency of the screen image. In 2016, HEVC officially released the fourth version of HEVC-SCC (HEVC Screen Content Coding Extension), which has been greatly optimized for screen content, increasing new encoding tools, greatly enhances the coding efficiency of the screen content.
2, HEVC-SCC Key Technical Introduction
HEVC-SCC was officially released in 2016. On the basis of HEVC and HEVC-Rext, some encoding tool sets were added, as shown below:
Figure 3 HEVC version of the new encoding tool set
From the above figure, it is known that the main tools sets of HEVC-SCC have:
Intra block copy (IBC): In addition to the traditional intra and inter prediction mode, the HEVC-SCC introduces a new encoding mode IBC. The PU of this mode is rebuilt by the current frame as a prediction. Block, it can be considered that IBC is a motion compensation in the currently encoded image: a Palette Mode: For screen content images, many colors inside the module block are limited. The palette mode enumerates these color values to generate a color table, then pass a index to each sample to indicate which color it belongs to the color table. Compared to the conventional coding method based on prediction-transform, the palette mode is often more effective for the number of color numbers relatively concentrated. Adaptive Color Transform: ACT: The screen content generally uses RGB color space, which is very important to eliminate redundancy between different color components is very important for improving coding efficiency. The HEVC-SCC supports the transfer of residual adaptation to different color space, and an image block of the RGB color space can be encoded directly, or the adaptive transition to YCOCG color space is encoded to enhance its coding efficiency. Adaptive Motion Vector Resolution: AMVR: The image captured by the average camera is usually continuous, but the motion of the screen content is usually discrete, and its fine particle size is pixel level. Therefore, for most screen content, it is not necessary to make a score-pixel motion compensation. In the HEVC-SCC, the accuracy of the motion vector MV can be controlled at the SLICE stage (all pixels or score pixels).
These new coding tools have greatly enhance the coding efficiency of the screen content image. According to the relevant information statistics, for screen images, the HEVC-SCC is increased by 36% (AI mode) and 20% (LD mode) compared to HEVC-Rext, and in lossless mode, HEVC-SCC phase The HEVC-RexT compression efficiency is higher than 56% (AI mode) and 40% (LD mode). Among them, the performance improvement effect of IBC and palette coding technology is most pronounced. Among them, the encoding gain of the IBC is more than 30%, while the palette coding can be increased by 10-15% on the basis of IBC. Next, we will focus on IBC and palette coding technology.
1) IBC encoding
For text, graphics and other screen content sequences, there are many repetitive textures in the same frame, namely strong spatial correlation. As shown in FIG. 4, the texture of the region shown in the red and blue box shown is almost consistent. If the current block is encoded, the encoded efficiency can be greatly improved with reference to the encoded block.
Figure 4 Screen image spatial correlation example
In response to the characteristics of the screen image spatial relevance, the screen content encoding introduces new prediction technology, ie IBC technology. IBC and interframe image prediction Similarly, only the IBC's prediction block is generated by the reconstruction block of the currently encoded image frame. IBC is performed at the PU level, we can treat it as an inter-frame PU. Interframe mode design makes IBC and ordinary inter prediction modes more flexible. For example, an inter-encoded Cu can have two PUs, a traditional inter-frame prediction, and another use IBC.
Although the IBC and inter-frame models are unified, the inter prediction model cannot be used directly in IBC. The IBC has the following restrictions on the traditional inter prediction model:
1) IBC reference is the reconstruction pixel before loop filtering;
2) If the current image is used for reference, it is marked as a long-term reference frame. When the entire image is decoded, it will loop filter, then join to the DPB as a short-term reference frame.
3) IBC's prediction blocks cannot overlap the current Cu to prevent unburable samples from being used for prediction;
4) The prediction block and the current CU should be located in the same slice and the same TILE;
5) The search area of the predictive block must be strictly limited to the gray area portion shown in Fig. 5, so as not to affect the parallel processing;
6) IBC block vector (block vector) must be complete pixel precision;
Figure 5 IBC search area (gray part)
2) Palette mode
The palette mode is particularly good for encoding blocks that are less than the number of colors. Unlike traditional predictive + transform, the palette mode completes the pixel level reconstruction through color tables and indexes. The encoding end generates a color table for each pixel within the CU, which transmits a color table index, and the decoding end is rebuilt according to the color table and the index. If a pixel does not find a suitable color in the color table, the palette mode will place the pixel in the ESCAPE mode. For the pixels of the ESCAPE mode, quantify the inverse quantization operation to complete the reconstruction. Figure 6 is an example of a palette mode;
Figure 6 Pattern encoding example
3, screen coding technology implementation and optimization
Last year, the audio and video laboratory launched a wireless destination application. Most of the application scenarios of wireless destinating users are PPT, Word, and other document sharing, which is ideal for screen content encoding technology. Since IBC and palette mode is most effective for lifting screen content image coding efficiency, we implemented IBC and palette mode encoding techniques in screen coding technology.
IBC and palette mode although compressed efficiency is high, its encoding complexity is also very high, and the coding speed is very critical to the real-time application of the screen. Therefore, in order to meet real-time applications, we have made a lot of optimization of IBC and palette technology. At the algorithm level, we use the fast and efficient color table generation algorithm, as well as the use of Hash-based search-based optimization techniques instead of traditional motion estimation methods. At the same time, a large number of advance exit algorithms have been added to increase the speed. In addition, for the critical module, SIMD optimization is added such that the coding speed is further improved.
After optimization, the encoding performance encoded by IBC and palette has been significantly improved. The following is our optimized quality comparison data, where x265 uses Ultrafast and Normal mode, the specific command line is as follows:
-p Ultrafast / Normal - Psnr - NO-Psy --NPut-Res 1920x1080 - FPS 15/1 in.yuv -o out.265 - QP 37 --Pools 4 --frame-threads 1 --keyint 1000 -F 500 - IPratio 1 --bframes 0 --Rc-LookaHead 0
Figure 7 TSE and X265 compare time comparison of compression efficiency / encoding of screen content
Figure 8 TSE and X265 comparison of compression efficiency / encoding time consuming on camera acquisition sequences
In view, for the camera acquisition sequence, the coding efficiency of TSE is less than 20% of the X265-Ultrafast mode. For screen content sequences, TSE is more than 70% compared to the X265-UltraFast mode. There are about 55% of the X265-Normal mode. In the coding time-consuming, for the screen content sequence, when the IBC and PLT do not open IBC and PLT, the average encoding of TSE is only about 33% of x265-Ultrafast; after opening the PLT and IBC, the average encoding of TSE is time consuming. It is about 50% of X265-Ultrafast. For the camera acquisition sequence, the average encoding of TSE is about 88% of X265-Ultrafast.
Subject quality comparison, for document class sequence, TSE encoding rebuild subjective quality is significantly better than X265. Figure 9 is a rendering diagram after the text is encoded by the TSE and X265, and we can see that the X265 encoded ringing effect is very obvious, and TSE can't see it.
Figure 9 TSE and X265 encoded subjective comparison
4, summary
The screen coding technology is a coding technology for screen content images, which has a significant advantage relative to H.265 encoded compression efficiency. Screen coding technology is of great significance for wireless destinations, conferences, online education, etc.
The audio and video laboratory has achieved IBC and palette mode encoding technology, and launched online applications in Tencent wireless destinations and Tencent conferences. Compared to the X265-Normal mode, the compression efficiency is increased by about 55% compared to the optimized TSE and X265-Normal mode. At the same time, the encoding speed is also significantly higher than X265, and the degree of real-time availability is reached. After using the TSE encoding method, for the screen content image, the subjective quality is improved while saving coding rate.
Be
Transfer from: https://cloud.tenceent.com/developer/Article/1427159
Our other product: