A new H.264 encoded inter prediction mode selection algorithm

"Keywords: Coding, inter prediction 1 Introduction In H.264 video coding, for inter prediction, a 16 × The macroblock (MB) of 16 can be divided into 16 × 16，16 × 8，8 × 16，8 × 8 motion estimation, of which 8 × 8 can be further divided into 8 × 4，4 × 8，4 × 4. In this way, each subdivision module looks for more accurate matching blocks, which can increase the prediction accuracy and improve the compression rate. However, because each classification needs motion estimation, the direct cost is a huge amount of computation. In view of the increase of computation caused by multi-mode prediction, the inter mode selection algorithm in recent years has been deeply studied. The idea of mode selection using threshold early cut-off has been widely used to reduce the computational complexity at the cost of small performance loss. As mentioned in the literature, if found 16 × If the 16 mode is applicable, skip the 16 mode directly × 8 and 8 × 16 mode check, otherwise full search; The literature proposes the method of using multi-level threshold, and the threshold changes according to QP. The research of this paper is also based on the method of threshold prediction for macroblock selection. 2 fast inter mode selection algorithm 2.1 inter mode selection algorithm using adaptive threshold Using threshold for prediction can indeed reduce the coding complexity as much as possible when the video quality degradation can be ignored. However, the above methods have some limitations. The algorithm proposed in the literature only considers three modes and still uses full search in many cases. Although the literature proposes a variable threshold, its threshold only changes with the change of QP and does not take into account the characteristics of different videos. Because different video sequences have different characteristics, even different frames in the same video sequence have different characteristics, and there are many factors affecting the threshold. Based on the above views, this paper proposes a statistical classification method, which classifies the inter prediction modes, and selects the adaptive threshold to select the macroblock mode. Generally 16 × 16 mode has the highest utilization rate and its sad (sum of absolute difference) value must be calculated, so 16 can be established × The relationship between the sad value of 16 patterns (hereinafter referred to as sad16) and the final selected pattern is a standard pattern classification problem. Through the time correlation between adjacent frames in the video and the sad16 distribution corresponding to the mode of the previous frame, the threshold of the next frame is trained, so as to select the macroblock mode of the next frame. The specific methods are as follows: 1) Classification Firstly, various frame modes of H.264 are divided into two categories: BSM (Bigsize mode), including 16 × 16，16 × 8，8 × 16 mode, SSM (smallsize mode), including 8 × 8，8 × 4，4 × 8，4 × 4。 Through the statistics of some CIF format test video sequences, it can be found that in general, the probability of BSM is greater than that of SSM (see Table 1). As described above, the amount of calculation of SSM is greater, so a threshold T can be set between BSM and SSM, which represents the acceptability of mode prediction accuracy. If it is less than the threshold T, the macroblock only selects BSM. If it is greater than the threshold T, the macroblock calculates both modes. 2) Statistics Calculate the sad16 under the two modes respectively. After testing, it is found that the sad16 value is generally less than 8000. In order to facilitate statistics, move the calculated sad16 value to the left by 7 bits, that is, divide by 128, and distribute sad16 in 64 value ranges. According to the original mode selection algorithm, each macroblock in a frame is mode selected, and its corresponding sad16 value is recorded, so as to count the distribution of the number of macroblocks in different sad16 intervals corresponding to each mode in the frame. Since the value of SSM is relatively small, in order to facilitate observation, the statistical results are multiplied by 10. The formula is as follows Where: K is 64 value ranges of sad16, and the value is [0, 63]; Sad16 ∈ K means that sad16 value is in interval K; If the macroblock is judged as BSM mode, nbsm = 1, NSSM = 0; If the macroblock is judged as SSM mode, NSSM = 1, nbsm = 0. Figure 1 and Figure 2 show the statistical distribution of two adjacent frames of two representative CIF sequences, foot ball and foreman, respectively. It can be seen from the statistical results in Figures 1 and 2 that the macroblocks in BSM mode account for the majority, and the corresponding sad16 is mainly distributed in the smaller value region, but there are also few in the larger value region; The number of macroblocks in SSM mode is small, and the corresponding sad16 is mainly distributed in the large value area, but the number is small (SSM mode in Figures 1 and 2 has been multiplied by 10). The same conclusion can be obtained through some other test sequences. Therefore, it can be determined whether the macroblock can be directly determined as BSM mode by taking a threshold T and calculating only sad16. When sad16t, both BSM mode and SSM mode are calculated. It can also be seen from figures 1 and 2 that the distribution diagrams of any two adjacent frames are very similar, whether it is a football sequence with intense motion or a foreman sequence with gentle motion. Therefore, based on the time correlation between the two adjacent frames of the sequence, the distribution of the previous frame can be used to predict the threshold of the next frame. 3) Misjudgment rate This algorithm saves a lot of computation, and the video quality hardly decreases. The cost is that some SSM macroblocks are judged as BSM and lose some compression rate. As shown in Figure 3, when the threshold value is t, the shaded part should be SSM, but it is judged as BSM because sad16 is on the left of the threshold value T. although the shaded part belongs to the misjudgment part, because the shaded part is on the left of the threshold value and belongs to the relatively small area of sad16, the final sad change is small and the compression rate loss is small. The criterion for selecting the threshold is to make the BSM mode to the left of the threshold T as much as possible to reduce the decision time; Make SSM mode as far as possible to the right of the threshold to reduce the loss of compression rate. 2.2 four adaptive threshold selection methods Based on the threshold selection criteria analyzed above, there are many methods to set the threshold of the next frame according to the statistical results of the previous frame. This paper preliminarily puts forward four threshold calculation methods: 1) Minimum value thresh (MVT) This method takes the maximum sad16 of BSM curve and the minimum sad16 of SSM curve, and then takes the smaller one as the threshold, as shown in Figure 4. The threshold calculation formula is Min (sad16, SSM) generally takes the minimum value of sad16 in SSM. In order to avoid the absence of SSM in some frames, max (sad16, BSM) is selected for comparison and the minimum value is taken. Since min (sad16, SSM) is generally increasing, and the value of max (sad16, BSM) itself is very large, they can be multiplied by a scale factor less than 1 to control its size. Then the modified threshold calculation formula is Where: α 1， α 2 ∈ (0,1) is the correction coefficient, which can be taken as α 1=1／2， α 2=3／4。 2) Area percent thresh (APT) This method calculates the total area of the BSM curve and takes the percentage of the total area β％ The area corresponding to sad16 is used as the threshold, as shown in Fig. 5. The threshold calculation formula is Where: β ∈ (0100) is the area percentage factor, taken as β= 75. The effect is good. 3) Highest point thresh (HPT) This method takes sad16 corresponding to the highest point in the BSM curve as the threshold, as shown in Figure 6. The threshold calculation formula is 4) Attenuation factor thresh (AFT) In this method, the highest point reached in the BSM curve is taken, and (1) of the height is taken after crossing the highest point- ω) The sad16 corresponding to the height is used as the threshold, as shown in Fig. 7. The threshold calculation formula is Where: ω ∈ (0,1) is the attenuation factor, which is taken as ω= The effect of 0.75 is good, and the value of Taft > thpt can be taken. This paper only experimented with four methods to calculate the threshold. Using the statistical diagram obtained in 2.1 and the idea of pattern classification, other calculation methods can also be used to obtain the available threshold. 2.3 algorithm flow According to the information of each macroblock recorded in step 4 of the previous frame macroblock level, calculate the threshold T required for inter mode selection of this frame according to several methods described herein, and use t to classify each macroblock in this frame. The specific steps are as follows: 1) Will 16 × 16，16 × 8，8 × 16 as an alternative mode, sad (16) is calculated × 16)，sad(16 × 8)，sad(8 × 16) , set the minimum value as sadmin; 2) If sadmin 3) Will 16 × 16，16 × 8，8 × 16，8 × 8 (of which 8 × 8 includes subblocks subblock 8 × 4，4 × 8，4 × 4) As an alternative mode, sad (8) is calculated × 8) , as previously calculated sad (16 × 16)，sad(16 × 8)，sad(8 × 16) Compare, find the minimum value, select the mode, and skip to step 4; 4) End the macroblock mode selection and record the relevant information of the macroblock for calculating the threshold of the next frame. 3 simulation results Simulation conditions: first use MPEG-2 encoder for 6 CIFS (352) × 288) MPEG-2 coding is performed on the first 100 frames of mobile, football, bus, news, table and foreman, and the source video is obtained by setting parameters N = 12, M = 3 and bit rate 6 Mbit / s. The transcoding process is implemented with reference to MPEG-2 decoder and t264 encoder. The frame rate is 30 f / s, OP is 30 and GOP is 200. The experimental platform is Intel P4 2.0 GHz, the memory is 512 Mbyte, and the operating system is Windows XP. The adaptive threshold algorithm is used, and the threshold is the minimum threshold method, which is called ath for short_ MVT, other abbreviations are similar. The best empirical factor is ath_ Apt algorithm β= 75，ATH_ Extract from aft algorithm ω= 0.75。 The simulation results are shown in Table 2 (three typical sequences are given: mobile with more details, football with more intense movement, news with more stable results, and other sequence results are omitted). The simulation results show that: ATH_ The performance of MVT method is general in the reduction of search time and coding time. Except for the more vigorous football sequence, the search time is reduced by more than 38% and the coding time is reduced by more than 26%. The PSNR of this method decreases within 0.01dB, the bit rate increases within 0.39%, and the performance loss is small. ATH_ Apt method has better performance in the reduction of search time and coding time. Most of its search time is reduced by more than 45%, and most of its coding time is reduced by more than 30%. It is worth mentioning that it has a better time-saving effect than other methods for relatively intense football sequences; The PSNR of this method decreases within 0.01 dB, the bit rate increases within 1.1%, and the performance loss is slightly larger. ATH_ HPT method is the worst for all sequences in terms of the reduction of search time and coding time. The minimum reduction of search time is about 7% and the minimum reduction of coding time is about 5%; This method has the best video quality, the PSNR value is not lower than the original algorithm, the bit rate increases within 0.47%, and the performance loss is small. ATH_ Aft method has the best performance in the reduction of search time and coding time. Except for football sequence, its search time is reduced by more than 50% and its coding time is reduced by more than 30%. The video quality of this method is better, the PSNR value is basically equal to the original algorithm, the bit rate increases within 0.15%, and the performance loss is very small. The shaded part in Figure 3 is the misjudgment part of SSM, using the formula Comparing the number of SSMS determined by the adaptive threshold algorithm with the number of SSMS determined in the original algorithm, the misjudgment rate is shown in Table 3 (still only three typical sequences are listed). Where th_ The misjudgment rate of HPT (the original algorithm using the highest point threshold method) is relatively small, and the misjudgment rate of the other three methods is relatively large, but the maximum is no more than 30%. Comprehensive consideration: th_ The performance of MVT is mediocre in all aspects; TH_ Although HPT has a smaller false positive rate, it saves little search time and coding time, so it is not practical; TH_ Apt is more suitable for situations where video motion is violent and performance loss is not very strict; TH_ The threshold obtained by aft method has the best effect in general. It can save coding time while maintaining good performance. Among the four methods, aft method is a better method for general sequences. 4 Summary Based on the time correlation between adjacent frames and the method of classification and statistics in pattern recognition, this paper proposes an inter frame mode selection algorithm in H.264 coding, and gives four methods to calculate the threshold for this algorithm. The experiments of several test sequences show that compared with the original algorithm, the proposed algorithm significantly saves coding time when the performance degradation can be ignored, including th_ The threshold selection method of aft is the best for general sequences. This algorithm can be further extended, such as adding multi-level threshold and designing better threshold calculation methods“