Abstract: H.26L is the next generation of video coding standards. Its coding beyond all existing standards, including H.263 + and MPEG-4 (SP). This article analyzes the various new coding characteristics introduced by H.26L, focusing on 4 × 4 points of integer transformation, and proposes a fast conversion algorithm implemented on TM1300.
introduction
H.26L is the next generation of video coding standards. Initially, H.26L was developed by the VCEG team of ITU-T. In November 2001, MPEG and VCEG jointly established a JVT team to participate in the formulation of H.26L. It is also because of the addition of MPEG, H.26L will be included in the tenth part of MPEG-4. Since the H.26L standard is still developing, this paper temporarily tested TML8 provided by JVT.
The basic encoding frame of the H.26L source encoding is similar to the current popular video coding standard, using mixed encoding techniques that combine transform coding and predicted encoding. It excellent performance is primarily derived from the introduced new coded characteristics: 4 × 4 points of integer transformation, using UVLC for entropy encoding, 1/4 ~ 1/8 pixel range run vector, there are multiple block size for motion estimation, etc. These new coding techniques increase compression performance and fault tolerance from different sides. Especially the 4 × 4 point changes, it is unique in all video compression protocols.
Although the H.26L standard is still developing, in a preliminary test, its encoding performance exceeds existing standards, including H.263 + and MPEG-4 (Simple Profile). These test results show that the H.26L ratio H.263 + can save 20% to 50% of the code rate than the MPEG-4 (SP), which saves up to 50% of the code rate than MPEG-4 (SP). As the next-generation video coding standard, H.26L shows its huge development prospects.
1 H.26L 4 × 4 points integer transformation
1.1 Transformation Introduction
In H.26L encoding technology, 4 × 4 points integer transformation can be seen as an integer version of DCT transformation, mainly complete the spatial relevance of the image, and has the same nature of 4 × 4 points DCT transformation. First consider one-dimensional integer transformation: set A, B, C, D is the point of 4 to be transformed, A, B, C, D is the corresponding four transform coefficients, and the following formula can be used to represent A, B, C. , D point positive transformation:
A = 13A + 13B + 13C + 13D
B = 17a + 7b-7c-17d
C = 13A-13B-13C + 13D
D = 7A-17B + 17C-7D
The reverse transform formula is as follows:
A '= 13A + 17B + 13C + 7D
B '= 13A + 7B-13C-17D
C '= 13A-7B-13C + 17D
D '= 13A-17B + 13C-7D
The relationship between A and A 'is A' = 676a. That is, after the reverse transformation, normalization operation is also required to make the positive conversion and the transformation scale.
The same two-dimensional 4 × 4 integer transformation is separable. The separated transformation will calculate complexity from O (N4) to O (N3).
1.2 Comparison of 8 × 8 points DCT transformation
Compared to traditional DCT transformations, the H.26L uses 4 × 4 points integer transformation to video coding:
1 Help to reduce the block spots and cycphin, improve image quality. Since the transform coefficients are quantified, there is a losing high frequency coefficient loss, so there is a block class and ring class in the recovered image. In H.26L, a smaller 4 × 4-point transform can be effectively suppressed to suppress the block spots and cyclic spots.
2 Integer transformation reduces the accumulation error. Traditional accumulation errors come from two aspects: the errors and quantization caused by positive transform and reverse transition do not match. In order to achieve the purpose of compression, the second mistake is inevitable. However, since H.26L uses an accurate integer transformation, positive transform and reverse transform do not generate errors, which effectively reduces the accumulation error.
3 Fast operations. Because the transform formula used by H.26L is a simple integer equation, that is, the calculation is based on integers, rather than floating point numbers, so it reduces the amount of single transformation, which is also advantageous to adopt a fixed point DSP implementation.
2 Implementation in TM1300
The TM1300 is a 32-bit ultra-high performance multimedia processor. Its core processor uses the VLIW long instruction word structure, and 5 operations can be performed simultaneously within each clock cycle; support highly parallel custom operations, which can greatly speed up special operations in digital signal processing and multimedia applications. Performance, and custom operation is similar to the C language function call, which is convenient for the program.
In this paper, the characteristics of 4 × 4 points integer transformation and the characteristics of TM1300 custom arithmetic instructions, the integer transformations are converted to the following adjustments: first do line transformations, then do column transformation. Since the results of the row transform do not exceed 16 bits of representations, the data is reconfigured before the column transformation, and then the re-transformation is based on the following two points.
First, since the video input data is an unsigned byte type, the TM1300 is a 32-bit processor, and the word is accessed in units, and the efficiency of access can be improved. The current 4 × 4 data block (pointer is P1) and the data structure of the reference frame 4 × 4 data block (pointer is P2) is as follows. The point to be transformed is the difference between the value of the current data block and the value corresponding to the reference frame data block.
P1: CAL, CB1, CC1, CD1 P2: RA1, RB1, RC1, RD1
CA2, CB2, CC2, CD2 RA2, RB2, RC2, RD2
CA3, CB3, CC3, CD3 RA3, RB3, RC3, RD3
CA4, CB4, CC4, CD4 RA4, RB4, RC4, RD4
Second, the 8-bit multiply / accumulated custom operation can be utilized, one operation can complete 4 8-bit multiply / accumulation, and a machine cycle (CLK) can perform 5 operations. Reduced the number of computational operations compared to non-customized multiplication / accumulation, improved the parallelism of the program operation. Figure 1 is a schematic diagram of the IFIR8UI custom operation function.
3 experimental results
The rapid algorithm based on the TM1300-based 4 × 4 integer transformation proposed, using parallel, the technology is greatly reduced. Experiments have shown that 1 4 × 4-point integer transformation, direct use of multiplication and addition operations require 80 machine cycles, and the improved algorithm only requires only 28 machine cycles; while using TM1300 to perform 1 8 × 8 point DCT transformation needs 180 machine cycles, also significantly greater than 4 4 × 4 points integer transformation time. The transformation coding calculation complexity of H.264 in the conversion is smaller than other encoding methods. Technology area
Tektron supports Amazon (AWS) media service, providing quality assurance for end-to-end video
IMEC is about to shock the first short-wave infrared (SWIR) band hyperspectral imaging camera
4K super high-definition home theater projector brings HD experience, full of fun
Video display system design based on unified calculation architecture technology
Apple TV 4K dismantling report: familiar modular components
Our other product: