FMUSER Wirless Transmit Video And Audio More Easier !

[email protected] WhatsApp +8618078869184
Language

    Implementation of Cabac hardware accelerator in the H.264 decoder

     

    Key words: CABAC, decoder, hardware accelerator H.264 is a new generation of video coding standards jointly developed by the International Telecommunication Union (ITU) and International Standardization Organization (ISO). This standard uses a range of advanced coding technology, which is surpassing many aspects such as coding efficiency, network adaptability. Previous video standards. H.264 has two entropy coding schemes: a context-based adaptive variable length encoding CAVLC based on the development of the variable length coding scheme; the other is a context-based adaptive from the arithmetic coding. Binary arithmetic code CABAC. Compared to CAVLC, Cabac can save about 7% of the code stream, but increased by 10% calculation time. When solving the clearance stream, use software to do complex entropy decoding such as CABAC, unable to complete real-time decoding tasks, therefore, designing hardware accelerator is very necessary. In the input stream of the H.264 decoder, the CABAC decoding algorithm is in the input stream of the data. The syntax element is syntax Element, and the code stream is connected by one syntax element. Each syntax element consists of several bits, indicating a particular physical meaning. In the code stream defined in H.264, the syntax element is organized to have a hierarchical structure, which describes the sequence, a sequence, a slice, a macroblock, a sub-macroblock (Subblock). A level of information, CABAC is primarily decoded by syntax elements below the slice. The overall process of CABAC decoding can be divided into three steps: initialization, binary arithmetic decoding normalization, anti-binary. initialization The process is performed at each piece, including the initialization of the contextvariable, the initialization of the decoding engine (Decoding Engine). Binary arithmetic decoding and normalization Binary arithmetic decoding is the core part of CABAC decoding, which implements the decoding of 1 bit data, and decodes each syntax element need to be called. There are three modes of binary arithmetic decoding in H.264: decode decision, bypass decoding, and end decoding (Decode Terminate). When decoding different syntax elements, one or more of these three modes are called. Antifrication Cabac defines four binary methods: a unary, truncated unary, Kth Order Exp-golomb, and fixed-length code (Fixed-length). A syntax element can correspond to one or two of the above binary methods, but in particular, the antif dimensions of syntax elements MB_TYPE and SUB_MB_TYPE are independent of the above four methods, which are implemented by checking tables. Architecture design of Cabac hardware accelerator H.264 decoder's soft / hardware division The H.264 decoding process uses a soft / hardware union decoding scheme, and the entire decoder consists of 32-bit CPU, DSP structure arithmetic unit and hardware accelerator. The Cabac entropy decoding section is mainly some judgment and branch operation, data interface, and not excellent throughput, and these tasks are completed by software and hardware acceleration. The CABAC decoding module designed in this article is a Cabac hardware accelerator. Overall architecture of Cabac hardware accelerator The overall architecture of the Cabac hardware accelerator is shown in Figure 1. Its overall architecture is divided into two layers: the top layer is Cabac_top; 7 modules, including cabac_center_control_unit, context, neighbor_mb_information, context_init, ac_Next_state_lps, ac_Next_State_MPS, and Rangelps. Cabac_center_control_unit module is responsible for the initialization of the context model variable, updates context, and transmits the residual data to IQ & IDCT module; the Context module is a double-port RAM, stores 459 context model variables, can simultaneously Model variables perform read operations and writes the other address context model variable; the neighbor_mb_information module is SRAM, stores macroblock information, and the CABAC decoder needs to refer to the top and left macroblock when parsing the syntax elements in the current macroblock. Information, therefore, it is necessary to save the previous line macro block of the current macroblock in the SRAM and the information before the macroblock before the row, each solution is updated to update the SRAM; the Context_init module is a piece of internal ROM for initialization Variables; 3 lookup table modules ac_next_state_lps, ac_next_state_mps, and Rangelps are implemented by combined logic, for computational table operations during binary arithmetic decoding processes. Hardware analysis of CABAC The goal of this design is to make the chip of the entire H.264 decoder to perform real-time decoding of HD images (1920 × 1088). Assuming that the chip operates at 166 MHz frequencies, the image playback rate is 25fps, and the time to solve a macroblock is 823 clock cycles. Considering that the operation of the H.264 entropy decoding portion is generally serial decoding, the parallelism is poor, and therefore the Cabac hardware accelerator needs to complete the decoding of 1 bit data in three clock cycles. It is assumed that the compression ratio of the video image is 20: 1, and YUV is 4: 2: 0 sampling because the sample value is 8bit, then each pixel is 8 bit × 1.5 = 12 bit. The decoding rate of Cabac is approximately 1: 1.2, so the CABAC's code stream is (1920 × 1088 × 12bit / 20) × 1.2, about 1.43MB. The chip operating frequency is 166MHz, and 1 bit is solved every 3 clocks, the decoded data rate is about 55.3Mbps, and the CABAC takes up 90% in decoding, about 49.8Mbps. Therefore, the decoding speed is 49.8 / 1.43, about 34.7fps, that is, the LS can be divided into 34.7 frames, then the 1 frame (1920 × 1088) is approximately 28.8 ms. In order to achieve this goal, the design of the Cabac hardware accelerator must be optimized for the core binary arithmetic decoding. According to the characteristics of the normalization algorithm, the number of cycles can be judged by the CODiRANGELPS obtained by the input CODIRANGE, CODIOFFSET, and the check list, so that two steps can be merged and normalized to make it in one clock cycle. Finish. Due to the limited space, the following is only used as an example, and the hardware, bypass decoding, and end decoding can be referred to the H.264 protocol in terms of rule decoding in three modes. The rule-decoded binary arithmetic decoding and normalization process mainly includes comparison, subtraction, check table, shift operation. In H.264, in order to reduce the complexity of the calculation, CABAC first establishes a 64 × 4 two-dimensional table Rangetablps [64] to store the pre-calculated multiplication result. The entrance parameters of the table are PStateIDX and QCodRangeidx, where qcodirangeidx is quantified by variable CoDiRan, and the quantization method is (CODIRANGE> 6) & 3. Its Verilog HDL is implemented as follows: After the probability model and multiplication model are established, the CABAC must save the variable during the process of progress: the lower limit of the current interval CODIOFFSET, the size of the current interval CodiRange, current MPS (Probability Symbol) Character Valmps, LPS (Small Probability Symbol) Probability number P S Tate I d x. TransidxLPS [PStateidx] and TransidXMPS [PStateIDX] are tables with a depth of 64 items, where PStateIDX is 0 ~ 63. Next, normalization is determined. When the CODIRANGE is less than 0x0L00, normalization is required. This allows two steps to complete binary and normalization within 1 clock cycle, and its Verilog HDL is implemented as follows: CABAC's acceleration strategy State machine design The state machine of binary arithmetic decoding is the core of this design, which will directly affect the decoding speed of the CABAC hardware accelerator. When the CABAC module is not activated, the state machine will stay in the initial state, when a new film begins, the initialization decoding engine; when receiving the decoding request issued by the CPU, first enter the predecessor state, read the context model variable, then Enter the binary arithmetic decoding state in the next clock to complete the decoding of 1 bit data. During the CABAC decoding process, the system selects decoding mode according to the type of syntax element and the location of the current data. Joining line design The process of decoding 1bit data can be divided into two steps: read the context model variable, decoding and updating the context model variable. This design uses two-level pipeline structure, and can read the context model variables of the next data while decoding the current data, so speed up the decoding speed. Double buffer design read by code stream When decoding, in order to improve transmission efficiency, double buffer form is used. When a bus is transferred to one of the buffers, the decoder can decode the data from another buffer, thereby making transmission and decoding simultaneously, effectively improving transmission efficiency. Design results and performance simulation After the design is complete, the standard test code stream provided by JVT is tested, and the simulation verification is passed. The result shows that the average of this design can complete the decoding of 1 bit data per 2 clock cycles. On the basis of the SMIC O.18μm CMOS process standard unit, the DC (DesignCompile) is performed, and the area of ​​the hardware accelerator is 0.38mm2 (excluding the area occupied by the outer SRAM), and the operating frequency can reach 166MHz to achieve the expected requirements. In order to display the advantages of the hardware accelerator, select the function BIARI_DECODE_SYMBOL for reference software JM7.4 to complete binary arithmetic decoding and normalization. The compilation result of the Visual C ++ 6.0 compiler shows that the function uses 109 assembly instructions, so the decoding of 1bit data is completed with the software requires at least 100 clock cycles. When this design is used to complete the same step, up to 3 clock cycles is very good, and it has achieved a role in the accelerator. Conclusion Due to the use of a series of optimization schemes, the coordination between the decoding speed and the decoding system is considered in design, this paper implements quick decoding of entropy decoding CABAC, and can complete the real-time decoding task of HD code streams. There is a good application value in the video decoding chip.

     

     

     

     

    List all Question

    Nickname

    Email

    Questions

    Our other product:

    Professional FM Radio Station Equipment Package

     



     

    Hotel IPTV Solution

     


      Enter email  to get a surprise

      fmuser.org

      es.fmuser.org
      it.fmuser.org
      fr.fmuser.org
      de.fmuser.org
      af.fmuser.org ->Afrikaans
      sq.fmuser.org ->Albanian
      ar.fmuser.org ->Arabic
      hy.fmuser.org ->Armenian
      az.fmuser.org ->Azerbaijani
      eu.fmuser.org ->Basque
      be.fmuser.org ->Belarusian
      bg.fmuser.org ->Bulgarian
      ca.fmuser.org ->Catalan
      zh-CN.fmuser.org ->Chinese (Simplified)
      zh-TW.fmuser.org ->Chinese (Traditional)
      hr.fmuser.org ->Croatian
      cs.fmuser.org ->Czech
      da.fmuser.org ->Danish
      nl.fmuser.org ->Dutch
      et.fmuser.org ->Estonian
      tl.fmuser.org ->Filipino
      fi.fmuser.org ->Finnish
      fr.fmuser.org ->French
      gl.fmuser.org ->Galician
      ka.fmuser.org ->Georgian
      de.fmuser.org ->German
      el.fmuser.org ->Greek
      ht.fmuser.org ->Haitian Creole
      iw.fmuser.org ->Hebrew
      hi.fmuser.org ->Hindi
      hu.fmuser.org ->Hungarian
      is.fmuser.org ->Icelandic
      id.fmuser.org ->Indonesian
      ga.fmuser.org ->Irish
      it.fmuser.org ->Italian
      ja.fmuser.org ->Japanese
      ko.fmuser.org ->Korean
      lv.fmuser.org ->Latvian
      lt.fmuser.org ->Lithuanian
      mk.fmuser.org ->Macedonian
      ms.fmuser.org ->Malay
      mt.fmuser.org ->Maltese
      no.fmuser.org ->Norwegian
      fa.fmuser.org ->Persian
      pl.fmuser.org ->Polish
      pt.fmuser.org ->Portuguese
      ro.fmuser.org ->Romanian
      ru.fmuser.org ->Russian
      sr.fmuser.org ->Serbian
      sk.fmuser.org ->Slovak
      sl.fmuser.org ->Slovenian
      es.fmuser.org ->Spanish
      sw.fmuser.org ->Swahili
      sv.fmuser.org ->Swedish
      th.fmuser.org ->Thai
      tr.fmuser.org ->Turkish
      uk.fmuser.org ->Ukrainian
      ur.fmuser.org ->Urdu
      vi.fmuser.org ->Vietnamese
      cy.fmuser.org ->Welsh
      yi.fmuser.org ->Yiddish

       
  •  

    FMUSER Wirless Transmit Video And Audio More Easier !

  • Contact

    Address:
    No.305 Room HuiLan Building No.273 Huanpu Road Guangzhou China 510620

    E-mail:
    [email protected]

    Tel / WhatApps:
    +8618078869184

  • Categories

  • Newsletter

    FIRST OR FULL NAME

    E-mail

  • paypal solution  Western UnionBank OF China
    E-mail:[email protected]   WhatsApp:+8618078869184   Skype:sky198710021 Chat with me
    Copyright 2006-2020 Powered By www.fmuser.org

    Contact Us