"H.265 (HEVC) depth analysis
Bandwidth crisis, H.265 rescue
The ultra HD trend of digital video is forward, and the frame rate is advanced from 30 fps to 60fps, 120fps or even 240fps. At the same time, the physical media Day is the Type of Xishan, and the content is transmitted through the terminal device in the world's various corners. Highly intensive data brings great challenges to bandwidth and storage. The current mainstream H.264 begins to apply, and the new generation of video coding standard H.265 seems to be a "savior" in the digital 4K era.
H.265 is also known as HEVC (full name High Efficiency Video Coding, high efficiency video coding, this culture is called H.265), is a successor of ITU-T H.264 / MPEG-4 AVC standard. 2004 began by ISO / IEC MOVING PICTURE Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) as ISO / IEC 23008-2 MPEG-H Part 2 or as ITU-T H.265 is started. The first edition of HEVC / H.265 video compression standard was accepted as a formal standard for the International Telecommunication Union (ITU-T) on April 13, 2013. .
Theoretically increased by 30-50% higher than H.264 (especially in higher resolution situation), but is it really simple?
H.265 changes
The H.265 reuses many concepts defined in H.264. Both are block-based video coding techniques, so they have the same root source, and similar encoding methods, including:
1. Segment the picture with a macroblock and finally subdivided by block.
2. Reduce spatial redundancy using intra compression technology.
3. Reduce time redundancy (motion estimation and compensation) using intra compression technology.
4. Use conversion and quantization to perform residual data compression.
5. Reduce the last redundancy in residual and motion vector transmission and signal transmission using entropy coding.
In fact, video codec is not fundamentally improved from MPEG-1, and H.265 is just a more powerful evolution and simplification of H.264 in some key performance.
So, where is the problem, where is H.265?
Like H.264, H.265 can also be adjusted according to bandwidth requirements. But do you want to transfer 4K content on normal Internet, or to achieve the best image quality, it is necessary to clarify the two concepts of "more compression" and "better compression". If it is just more compression, 4K and ultra HD do not have to ensure better picture quality than today's 1080P or HD. Based on the amount of compression, streaming 4K is likely to look worse than the current 1080p blue, because 1080p Blue light has more bandwidth for video compared to line stream media. Better compression means smarter compression, facing the same original material, better compression will reduce the amount of data in the case where the quality is not sacrificed. More compression is easy, and better compression requires more thinking and better techniques, handling images through more intelligent algorithms, maintaining lower bit rate while maintaining quality, this is H.265 What to do.
How to achieve better compression, for example, we usually find that in a lot of image materials, such as video conferencing or movies, most of the contents on each frame do not change too much, video conference In general, only the speaker's head is moving (even only with lips), and the background is generally not moving. In this case, our approach is not per pixel code for each frame, but the initial Frame encoding and then encodes only changing.
The left picture is a macroblock processed by AVC / H.264, while the right picture has more flexibility.
H.265 is moving toward "better compression" from the following aspects.
Image partition
The H.265 divides the image into "Coding Tree Blocks, CTU", rather than 16 × 16 macroblocks like H.264. According to different coding settings, the size of the tree blocking block can be set to 64 × 64 or limited 32 × 32 or 16 × 16. Many studies have exhibited a larger tree blocking block to provide higher compression efficiency (also requires a higher coding speed). Each tree encoding block can be recursively divided, using a quadruple structure, divided into sub-regions of 32 × 32, 16 × 16, 8 × 8, and the following figure is a partition example of a 64 × 64 tree encoding block. Each image is further distinguished into a special tree block group, called a slice and a tiles. The encoding tree unit is the basic coding unit of H.264, as macroblocks with H.264. The encoding tree unit can be subjected to a downward coding unit (CODING Unit, Cu), a prediction unit (PU), and Transform Unit, TU).
Each encoded tree unit contains 1 brightness and 2 color encoding a tree, and a syntax element that records additional information. Generally speaking, most of the film is compressed in YUV 4: 2: 0, thus as an example of 16 x 16, which contains 1 16 x 16 brightness encoding tree block, and 2 8 x 8 chromaticity encoding tree block.
The encoding unit is a basic prediction unit of H.265. Typically, the smaller coding unit is used in the detail region (e.g., boundary, etc.), while a larger coding unit is used in a predictable planar region.
Conversion size
Each coding unit can be recursively divided into a conversion unit in a quadruple tree. The H.264 is mainly converted with 4 × 4 conversion, and occasionally vary from 8 × 8 conversion, H.265 has several conversion sizes: 32 × 32, 16 × 16, 8 × 8 and 4 × 4. From a mathematical perspective, a larger conversion unit can better encode static signals, while smaller conversion units can better encode smaller "pulse" signals.
Predictive unit
Before converting and quantify, first is the prediction phase (including intra prediction and inter prediction).
A coding unit can be predicted using one of the following eight prediction modes.
Even if a coding unit contains one, two or four prediction units, a special inter-frame or intra prediction technique can be used to predict it, and the internal encoded coding unit can only use a square division of 2n × 2n or N × N. . The encoded coding unit can be divided into a square and non-normally modified manner.
Inframe prediction: HEVC has 35 different intra prediction modes (including 9 AVCs already), including DC mode, Planar mode, and 33 directions. The intra prediction can follow the divided tree of the conversion unit, so the prediction mode can be applied to the converted unit of 4 × 4, 8 × 8, 16 × 16 and 32 × 32.
Inter Bloom Prediction: For motion vectors, H.265 has two reference tables: L0 and L1. Each has 16 reference items, but the maximum number of unique pictures is 8. H.265 Motion estimation is more complicated than H.264. It uses the list index, there are two main prediction modes: merge and advanced motion vector (Merge and Advanced MV.).
In the encoding process, the prediction unit is a predicted basic unit, and the transform unit is a basic unit for transforming and quantization. The separation of these three units makes the transformation, prediction, and encoding various processing links,
Go to block
Different from H.264 in 4 × 4 blocks, it is different that HEVC can only implement a block on 8 × 8 grid. This allows for parallel processing (no filter overlap). First of all, the block is all vertical edges in the picture, followed by all horizontal edges. The same filter is used with H.264.
Sample adaptive offset
There is also a second optional filter after going to block, called sampling point adaptive offset. It is similar to the deblock filter, and the application is stored in the reference frame list in the prediction cycle. The target of this filter is to revise the error prediction, encoding drift, etc., and apply adaptive to offset.
Parallel processing
Since the decoding of HEVC is more complicated than AVC, some techniques have been allowed to implement parallel decoding. The most important thing is to collage and wavefront. The image is divided into a rectangular mesh of a tree coding unit. The current chip architecture has gradually developed from single nuclear performance, so in order to adapt to a very high degree of chip implementation, H.265 introduces a lot of parallel operations.
All in all, HEVC pushes traditional block-based video coding mode to higher efficiency levels, summarizing it:
- Variable size conversion (from 4 × 4 to 32 × 32)
- Predictive area of the four-tree structure (from 64 × 64 to 4 × 4)
Motion Vector Prediction Based on Candidates List.
- A variety of intra prediction modes.
- More precise motion compensation filters.
- Optimized to block, sample point adaptive offset filter, etc.
Key coding characteristics
Challenges faced by H.265
Significant improvement in H.265 is not only in the frame compression, not only in the field of inter-frame compression, but also manifested in intra compression. Since the variable size conversion, H.265 has great improvements in block compression, but it also brings some new challenges while increasing compression efficiency.
Difficult scene
Video coding is a complex issue that is high for content. As is well known, there is a static background and highlighted low-dynamic scenes that can be more compressed than the picture of the black field. Therefore, for the modern codec like H.264, the first solve the most difficult scene / situation. include:
Key Frames with details: No matter how calculated, compressed keyframes are very difficult, especially when the image is characterized (such as a forest). If the keyframe starts from a quiet scene, low-dynamically high-efficiency prediction and compensation can achieve overall effective compression, but if complexity is suddenly increased between the image group, the encoder can easily encounter a crisis.
High dynamic "crisp" images: predicting highly complex dynamics is quite difficult. When it is mixed with high space complex, there will be a sustained bit rate soaring and / or more and more artifacts.
The slow dynamics of the dark area: The coding of the dark area is a big challenge because the eyes are more sensitive than the dark environment than the dark environment. In addition, if you add a slow motion scenario with texture or smoke or color, shadow, even if adaptive quantization or similar optimization is used, it is also very easy to see the annoying artifact.
Noise / Texture: Noise is almost impossible to compress. Fortunately, the eye is more sensitive to the texture and noise of image specific areas such as flat areas and dark areas, and the smart codec can give these more needless areas of the area. . Despite this, it is very difficult to compress noise, especially the noise in fast moving scenes. The compressed noise is easily discovered because an ugly model is created at low frequencies and interferes with motion estimation / compensation. Noise is not always suitable or urgent
Our other product: