Background and the latest high-efficiency video coding (HEVC) standard Compared to H. 264 Advanced video coding standards achieve significant compression efficiency improvements (50%) due to its superior compression performance, it is rapidly adopted in many applications. Compared to H.264 standard, the HEVC encoding has a very high computational complexity, making it difficult to implement real-time high quality encoding on a general purpose processor in a widely used multimedia transcodation. Due to the current H.264 extensive and in-depth applications, a large number of existing content has been encoded using the H.264 standard, and if there is a system encoding H.264 encoded to HEVC encoding, for a large number of existing H.264 encoding The transplantation of the video provides technical support, improves the interoperability of both, and ultimately realizes the transformation of new technologies, will have great value. An innovative encoder conventional encoder uses a three-level encoding structure: WPP (preceding parallel processing); fast mode decision technology; SIMD (single instruction Multi-data flow) acceleration. Here, on the basis of the original three-level coding, the fourth level of encoding is added to the encoder, that is, the task assignment of the screen set level is added. This task assignment divides the input bitstream into a separate code group; and assigns the different calculation units of the multi-core processor of the running transcoder to perform transcoding processing; the screen class level task allocation introduction: due to real-time The computational power of video coding / transcoding usually exceeds the calculation capabilities of existing single servers, so many real-time software encoder systems in reality are encoded by a high-speed network connection of the same standard 1RU, 2RU or 3RU server. Theoretical basis: GOP is a set of images that include the front guide instant decoding frame, and the image is followed by a continuous P / B frame independently encoded in other GOP. Due to independence, GOP can be distributed into a parallel processing in a system of multiple processing nodes. This parallelity level does not cause RD loss.
Ru is indicated by the unit of server external size
P frame: forward predictive frame
B frame: two-way predictive frame
123
This is a screen chart of the task allocation block diagram that implements 9 identical Intel i7 processors to implement the GOP level task allocation, using a high-speed network connection with bandwidth 1 Gb / s. When the H.264 to HEVC code conversion is performed, the input H.264 bit stream is resolved by the primary node into one GOP, and then each of the work nodes are sent to one of the operating nodes of the transcoder. The overall processing speed of this system is roughly a single processing frame rate multiplied by the number of operating nodes. Data will perform two-way compression during transmission, ie H.264 bitstream from the master device to worker nodes and the HEVC bitstream from the worker node to the main device will perform a compression. Therefore, the time required for such data transmission can be ignored as compared with the processing time of each working node for code conversion.
Wavefront Parallel Processing (WPP) Parallel Processing
Next, a brief introduction of parallel processing, WPP is a method for data parallel processing of image units having a dependency relationship, and CU's encoding can begin immediately after encoding at the top right of its upper right. Let W and H are the frame width and height of the CU, and the coding order of the Cu is shown in the figure, where the smaller index of the CU will be encoded earlier, and the encoding of the CU having the same index can be paralped. The higher the resolution, the faster the encoding. Considering that most mainstream servers have four to eight kernels, this parallelism is sufficient to make full use of multi-core processors.
Application of multi-level H.264 information in transcoding
Principle: Due to the inherent similarity between H.264 and HEVC, many H.264 bit stream information can be reused when transmitted from H.264 to HEVC, including partition size, prediction mode, reference image, and motion vector .
Partition size and mode decision compared to H.264 / AVC, HEVC's most important improvement is to include more partition size for motion prediction, compared to 4 × 4 to 16 × 16 in H.264 / AVC. It can vary between 4 × 4 to 64 × 64. Before you do: for partition size: < = 16 × 16: Legacy Cus, for CU in HEVC, there is a corresponding MB (macroblock) or sub-MB partition in H.264 / AVC.
16 × 16: Extended CUS, each of which covers multiple MBs in the H.264 / AVC. The PU is a basic unit for prediction. The H.265 uses the PU to implement the prediction process of each Cu unit. The PU size is limited to its Cu, which may be, for example, a block of 64 × 64 pixels, or a rectangle of, for example, 64 × 32 pixels. There is a new asymmetric motion division prediction (AMP) scheme, so that the coding unit is divided into two dimensional magnitude inconsistent prediction blocks. This prediction method takes into account the possible texture distribution, which can effectively improve the prediction efficiency of large size blocks. Compared with the HM reference software, the authors are divided into all blocks into all possible smaller sizes, and the authors are subjected to the subset of the split size based only by the input H.264 bitstream. Specifically, for traditional CUs, we only check the identical partition size used in H.264 / AVC. If the H.264 / AVC partition size is in the current division depth, it will not be further divided into smaller sizes. If the H.264 / AVC partition size is in the next partition depth, it will jump directly to the next depth without checking any partition in the current depth. For 32 × 32 expansion CUs, the following rules use the four MBs of MB to check partitions and patterns depending on their coverage. 2N x 2n all check; when there are more than two MBs using Inter-16 × 16 mode, Inter-2N × 2N will be checked; when both left MBs use Inter-16 × 16 mode or two right MBs When I use Inter-16 × 16 mode, check Inter-N × 2N; when the two bottom or top MB uses inter-16 × 16 mode, check Inter-2n × N; when using intra mode, there are more than two When MB is checked, 2N × 2n in the frame is checked. For the 64 × 64 extended Cu, the recursive partition check from 32 × 32 to 4 × 4 is first performed to determine if the 32 × 32 is the best partition size to be used. If possible, 32 × 32 Cu is considered 16 × 16 MB in the above steps, and a similar rule is used to check 64 × 64 partitions and modes.
Refer to the image of the PU in the conventional CU, only the reference image of the corresponding MB or sub-MB partition in the input H.264 bit stream is checked. For the PU in the extended CU, the reference images between all MB covered by the PU are attempted. For PUs in the traditional CU, we only check the reference pictures of the corresponding MB or sub-MB partition in the H.264 / AVC bitstream. For PUs in the extended CU, we will attempt to all MB reference images over from the PU. When the P fragment in H.264 is transcoded to HEVC, we encode it to the B fragment in HEVC, but will only try the reference picture in List_0 of H.264, at the same time attempt all reference pictures in HEVC LIST_1. . Motion Vector Estimation of Legacy Cu: The reference picture of the corresponding MB or sub-partition in the bit stream is input to the corresponding MV in the H.264 bitstream using the H.264 input bitstream. EXTENDED CU Estimation: Use the MV of all internal MBs overlink and the MV predictive value given by HEVC itself. The estimation is performed by calculating the median of the MV, where the median is calculated based on the Manhattan distance from the average. After the MV estimate of the extended Cu is completed, we will search in the four pixel ranges around the MV to perform the final MV selection.
Content supplement: the so-called GOP, means a picture group, a GOP is a set of continuous pictures. The MPEG encoding divides the screen (ie frame) into I, P, B, i is the internal coding frame, and P is the forward prediction frame, B is two-way interpolated frames. Simply put, I frame is a complete picture, and the P frame and B frame are recorded relative to I frames. There is no I frame, the P frames, and the B frame cannot decode. This is the reason why the MPEG format is difficult to accurately edit, and it is the reason why we have to fine tuning and tail. The longer GOP, the B frame is higher, the higher the rate of distortion of the encoded rate.
If you feel that the good partner written in this article, you can exchange progress together. If you have a job that is looking for work and interested in Ali, you can also send me a resume to push it:
Our other product: