H.264 / AVC is the ITU-T and ISO jointly published international video compression standard, the bit compression rate is 39% MPEG-4, H.263 and MPEG-2, the 49% and 64%, a high compression the new standard ratio. Based adaptive variable length coding (CAVLC) is one of the key technologies in H.264, H.264 is applied to the baseline and extended residual level of luminance and chrominance data blocks codec, coding efficiency, strong error resilience and error correction capability, but the large computational complexity, software coding can not meet the requirements of high-definition video in real time. H.264 encoding process does not involve any floating point operations, particularly suitable hardware circuit. After CAVLC coded document proposes scanning and coding can be divided into two parts, residual data scanning section reverse zig-zag scanning, run-level flag to extract information related to the encoding portion for encoding. Document scanning module is optimized. Encoding module nonzero coefficients in level (Level) calculating the amount of the maximum coding, the highest degree of complexity. FPGA herein take advantage of the characteristics of high-speed real-time, parallel processing and pipeline design, by optimizing the CAVLC coding structure and encoded sub-level module, to improve the performance CAVLC coder.
1 CAVLC principle
CAVLC is an encoding algorithm for Zig-ZAG scanning sequence according to a 4 × 4 transform coefficient. The non-zero-based amplitude of the block coefficient is small, mainly in low frequency bands, after the Zig-ZAG scan, the number of continuous zero is more, using the Run-Level run-up, efficient lossless compression by encoding 5 semantic elements The encoding process is shown in Figure 1. After zig-zag scanning sequence encoding factor flag (coeff_token). Tail symbols (trailing_ones_sign_flag) 1, except stage 1 outer tail nonzero coefficient (Level), the last number (total_zeros) zeros before a nonzero coefficient and zero run length (run_before). Wherein TC, T1, T0 respectively represent the number of non-zero coefficients, the number 1 and the number of the tail before the last nonzero coefficient zero. Since the process is serial CAVLC coding, the software easy to implement, but the execution is slow and inefficient.
2 CAVLC encoder hardware architecture
2.1 Parallel coding structure
In order to improve the computing speed and efficiency, the process of FIG. 1 CAVLC coding parallel processing for FPGA implementation. The document proposes the idea of the CAVLC coding scanning and coding divided into two parts, shown in Figure 2. Reverse scan by a zig-zag, statistics, coding, code stream integrating four modules. Statistics module and a zig-zag scanning section modules, and a coding module integrated modules stream encoding section, the control system uses a state machine. Since trailing_ones_sign_flag, level and run_before are encoded beginning from the tail of the zig-zag scanning sequence, so that the design uses reverse zig-zag scanning. Statistics Statistics module reverse zig-zag scanning sequence output counter TC, T1 and T0, the end of a symbol (T1_sign), one end of nonzero coefficients other outside (coeffs) and zero run length (the RunBefore) and stored in the buffer output. Encoding module is divided into six sub-modules: NC generation module, coeff_token module, trailing_ones_sign_flag module, level module, total_zeros and run_before module modules. Statistical encoding module to each sub-module provides input data, to ensure that all the encoding sub-modules in parallel, reducing the CAVLC encoded clock cycle, improve the efficiency encoder. Since CAVLC coding is variable length, such that each output symbol stream encoded sub-module of indefinite length, the width of each register a different codeword encoding submodules. In order to ensure that all sub-coding module generates a code word can be made compact and efficient storage seamless link, embedded in the output signal of flag information and the code length of each encoded code word output sub-module, when the flag signal is high the output code word of the code longer valid, invalid low, the stream output after integration module integration.
2.2 level coding OPTIMIZATION
Coding nonzero coefficients is the highest level CAVLC coding complexity, computationally intensive, coding delay is the longest part of the high-speed CAVLC encoder, one of the bottlenecks and efficient operation. The level of CAVLC in H.264 decoding step may be devised corresponding coding process, as shown in FIG.
(1) Initialization suffixlength to 0 if TC> 10, and T1 <3, is="" initialized="" to="" one.="" (2)="" calculates="" an="" intermediate="" variable="" levelcode="" [i]:="" (5)="" write="" the="" code="" word.="" nonzero="" coefficients="" stage="" codeword="" "prefix="" +="" suffix="" code="" word="" the="" code="" word",="" the="" word="" prefix="" code="" prefix="" followed="" by="" a="" 1="" a="" 0="" (i.e.,="" the="" codeword="" is="" a="" prefix,="" a="" code="" length="" of="" prefix="" +="" 1),="" the="" suffix="" code="" word="" value="" as="" a="" suffix,="" code="" length="" levelsuffixsize.="" fig="" 3="" based="" encoding="" process,="" encoding="" a="" desired="" level="" difference="" between="" the="" clock="" cycle="" t1="" and="" the="" related="" tc,="" different="" data="" blocks="" required="" for="" different="" clock="" cycles,="" and="" subject="" to="" scan="" the="" front="" and="" statistical="" encoding.="" when="" more="" nonzero="" coefficients,="" level="" required="" in="" the="" conventional="" coding="" using="" serial="" manner="" than="" the="" clock="" cycle="" may="" be="" more="" statistics="" module="" consumed,="" result="" in="" an="" unstable="" throughput.="" on="" the="" other="" hand,="" to="" obtain="" a="" code="" word="" level="" need="" to="" know="" the="" coefficient="" prefix,="" suffix="" and="" levelsuffixsize,="" and="" the="" size="" levelsuffixsize="" is="" adaptively="" changed,="" the="" absolute="" value="" of="" a="" coded="" coefficient="" related="" to="" the="" size,="" which="" gives="" the="" parallel="" processing="" brought="" a="" certain="" difficulty.="" for="" this="" purpose,="" two="" parallel="" processing="" pipeline="" and="" parallel="" processing="" structure="" combining="" two="" nonzero="" coefficients,="" as="" shown="" in="" fig.="" the="" first="" stage="" initializes="" suffixlength,="" and="" find="" the="" absolute="" value="" of="" the="" intermediate="" variable="" coeffs="" levelcode;="" second="" stage="" update="" suffixlength,="" calculated="" prefix,="" suffix="" and="" levelsuffixlength.="" module="" coeffs="" sipo="" buffer="" serial="" input="" parallel="" output,="" input-output="" relationship="" shown="" in="" fig.="" 3="" experimental="" verification="" analysis="" level="" coding="" circuit="" structure="" using="" verilog="" hdl="" language="" description,="" simulation="" on="" modelsim="" se="" 6.0,="" using="" synplicity's="" synplify="" pro="" synthesis="" process="" is="" completed.="" finally,="" using="" xilinx="" virtexⅱ="" series="" xc2v250="" fpga="" implementation="" and="" verification="" carried="" out.="" figure="" 6="" shows="" the="" simulation="" waveforms="" modelsim,="" the="" results="" are="" consistent="" with="" the="" values="" of="" jm16.2="" jvt="" verification="" software="" model.="" as="" can="" be="" seen="" from="" figure="" 6,="" a="" parallel="" encoding="" tc-t1="" saving="" level="" value="" (tc-t1)="" 2="" clock="" cycles="" than="" a="" serial="" manner,="" when="" there="" are="" many="" non-zero="" coefficients,="" the="" throughput="" can="" be="" obtained="" stably.="" table="" 1="" gives="" the="" synplify="" pro="" synthesis="" of="" hardware="" resources="" report.="" system="" allows="" maximum="" clock="" frequency="" of="" 158.1="" mhz,="" the="" consumption="" of="" hardware="" resources,="" as="" shown="" in="" table="" 1.="" to="" sum="" up,="" this="" is="" designed="" to="" meet="" real-time="" hd="" h.264="" video="" coding="" requirements.="" read="" more="">3,>
Our other product: