FMUSER Wirless Transmit Video And Audio More Easier !

[email protected] WhatsApp +8618078869184
Language

    Implement H. Based on the ADSP-BF533 hardware platform. 264 real-time decoder design

     

    "Introduction H. 264 is a new video coding standard jointly formulated by the joint video team JVT (joint video tearn), which is jointly established by VCEG of ITU T and MPEG of ISO / IEC. It is positioned to cover the whole video application field. H. 264 standard adopts new technologies such as motion compensation based on variable size macroblock, multi frame reference, integer transform, motion estimation based on 1 / 4 pixel accuracy, deblocking effect filter and so on, so it obtains better compression performance and greatly increases the amount of computation. Blackfin Processor adopts the micro signal structure jointly developed by Adi company and Intel company. Special video processing instructions are added in the structure. The working frequency is up to 756 MHz and can complete 12oom times / s multiplication and addition operation. Compared with DSP with superscalar structure or ultra long instruction set (such as TI's C6000 Series), Blackfin Processor has great advantages in power consumption and cost, and is very suitable for embedded video applications. 1 H.264 video coding standard H. The basic structure of 264 video codec is similar to the early coding standards (H.263, MPEG4, etc.), which are composed of functional units such as motion compensation, transform, quantization, entropy coding, loop deblocking effect filter and so on. H. The improvement of 264 standard is mainly reflected in each functional module. H_ The major improvements of 264 are shown in the following aspects: ① High precision motion prediction based on 1 / 4 pixel accuracy. ② Multiple macroblock partition modes. Each macroblock (16) × There are seven partition methods for the luminance component of 16 pixels: 16 × 16、16 × 8、8 × 16、8 × 8、8 × 4、4 × 8、4 × 4。 ③ Multi frame prediction. During inter frame coding, five different reference frames can be selected. ④ Integer transformation. Based on 4 × The integer transform of 4-pixel block replaces DCT transform. ⑤H_ 264 / AVC supports two entropy coding methods, namely CAVLC (context based adaptive variable length coding) and CABAC (context based adaptive arithmetic coding). CAVLC has higher anti error ability and lower coding efficiency than CABAC; CABAC has high coding efficiency, but needs more computation and storage capacity. ⑥ Intra prediction coding. H. 264 adopts a variety of reasonably designed intra prediction modes, which greatly reduces the coding rate of I frame. ⑦ The network adaptation layer nal (network abstraction layer) provides a unified network independent interface for the video coding layer, so that the video coding data can adapt to different network application environments. H. 264 is divided into seven different profiles - baseline profile, main profile, extended profile, high profile, high10 profile, high4:2:2 profile and high 4; 4: 4, respectively represent different technical constraints and algorithm sets. There is no copyright fee for the use of baseline Prome. 2 software and hardware implementation platform based on ADSP-BF533 The hardware platform adopts ADSP-BF533 ez-kit Lite evaluation board of ADI company. This evaluation board includes 1 ADSP-BF533 processor, 32MB SDRAM and 2 Mb? Flash and advl836 audio codec are externally connected with 4 input / 6 output audio interfaces, adv7183 video decoder and adv7171 video encoder are externally connected with 3 input / 3 output video interfaces, 1 UART interface, 1 USB debugging interface and 1 JTAG debugging interface. The system structure block diagram of the evaluation board is shown in Figure 1. The ADSP-BF533 processor used on the evaluation board has a working frequency of 756 MHz. The processor has the following characteristics: Double 16 bit multiplication accumulator; Dual 40 bit arithmetic logic unit (ALU); 4 8-bit video ALUs; 1 40 Bit shifter; Special video signal processing instructions; 148 KB on-chip memory (16 KB can be used as instruction cache and 32 KB can be used as data cache); Dynamic power management function, etc. Blackfin Processor also includes rich peripherals and interfaces: ebiu interface (4 128 MB SDRAM interfaces, 4 L MB asynchronous memory interfaces), 3 timing / counters, 1 UART, 1 SPI interface, 2 synchronous serial interfaces, 1 parallel peripheral interface (supporting itu-656 data format), etc. The structure of Blackfin Processor fully reflects the support for media applications (especially video applications) algorithms. The software verification adopts the following methods: firstly, the h.264 encoded file is copied to the memory of the evaluation board through the DSP simulator. Then, the software reads the data of the encoded file from the memory and decodes it. Finally, the decoded data is output to the adv7171 chip through the PPI interface. The adv7171 chip encodes the input video data into PAL format and outputs it to the second display for display. The software development platform of Blackfin Processor is visual DSP + + 4.0. Software design of 3 H 264 real-time decoder 3.1 overall software design In order to realize the requirements of real-time decoding, it is necessary to optimize the program design. The optimization process is as follows: ① Verify and evaluate the algorithm, optimize the program flow design and data structure design on PC. ② Porting program code to Blackfin Processor. Compile in the visual DSP + + integrated development environment, delete the code related to PC platform and add the code related to DSP platform. ③ Optimize the operation based on DSP platform. Set the compilation parameters for speed optimization, optimize the C language level, rewrite the most time-consuming functions with assembly instructions, and reduce the execution time of functions by using special vector instructions and parallel instructions. 3.2 implement and optimize the decoder program on PC The decoder program refers to jm9.6 and is optimized in the following aspects: ① Since only baseline profile is supported, redundant program codes related to unsupported features such as B frame, Si slice, SP slice and data segmentation are deleted; ② Modify jm9.6, allocate memory every time a slice is processed, read the information in it, release memory, and reasonably arrange the allocation and release of memory space; ③ The I frame and P frame are decoded independently, and the macroblock decoding is also divided into different decoding modules according to the prediction mode and prediction direction, so as to eliminate the repeated judgment in the middle and improve the decoding speed; ④ Optimize the query method of CAVLC code table. 3.3 program migration Visual DSP + + is an integrated development and debugging environment supporting Blackfin Processor, including visuaidsp + + kernel (VDK), C + + / C + + compiler, advanced graphics rendering tool, debugging tool, device simulator and other functions; It can well support the development work with C / C + + language on Blackfin Processor. The first step of the migration is to remove all functions not supported by the compilation environment (such as some time-dependent functions), modify the file operation to read the file data cache, and delete the unnecessary code implemented by DSP platforms such as SNR information collection and information printout. The second step is to add hardware related code. These codes include system initialization code, output module code, interrupt service program, decoding rate control program and other program codes. After the transplantation, the H_ 264 decoder; However, the speed can not meet the requirements of real-time decoding, and needs to be optimized. 3.4 optimization based on DSP platform The optimization based on DSP platform is divided into system level optimization, C program level optimization and assembly level optimization. (1) System level optimization Turn on the optimization switch in the compiler and set it to speed optimization; Turn on the automatic inline switch; Turn on the "interprocedural optimization" switch; PGO (profile guided optimization) of visual DSP + + compiler is used to optimize the compilation technology. (2) C program level optimization C program level optimization is mainly based on the specific characteristics of biackfin processor: ① Write a link description file to store the frequently used data in the on-chip memory, such as the code table of CAVLC entropy decoding; Enable instruction cache and data cache, and set the instruction address and data address for enabling cache mechanism. ② Convert the division operation to multiplication operation or use the look-up table method to calculate. ③ Reduce the number of accesses to off chip memory. For frequently accessed off chip memory areas, set cache enable and cache lock to prevent cached data from being replaced and reduce the probability of cache miss. ④ For data that can be expressed with a shorter data type, use a shorter data type instead, such as 4 originally defined as int type × 4 the input data of inverse integer transformation can actually be defined as short type. (3) Assembly level optimization Assembly level optimization usually follows the following principles: ① Use registers instead of local variables. If a local variable is used to hold the intermediate result of the calculation, a register is used Replacing local variables can save a lot of time when accessing memory. ② Use hardware loops instead of software loops The Blackfin Processor has dedicated hardware to support two-level nesting with zero overhead Hardware loop. Replacing software loop with hardware loop can avoid blocking the pipeline and improve the speed. ③ Use parallel instructions and vector instructions. Using parallel instructions and vector instructions can make full use of the advantages of SIMD system structure of Blackfin Processor and parallel processing of internal hardware resources, reduce the number of instruction execution and improve the efficiency of instruction execution. Use 1 parallel instruction to execute 2 or 3 non parallel instructions at the same time. Vector instructions can perform the same processing operations on multiple data streams at the same time. ④ Use video processing instructions. Video processing applications can use video processing instructions dedicated to Blackfin Processor to improve execution efficiency. Rewrite some of the most time-consuming functions with assembly language, make full use of the advantages of s1md structure of Blackfin Processor and the parallelism of hardware, execute multiple operations in one instruction cycle, and reduce the instruction cycle required for function execution. The most time-consuming function is the macroblock decoding function decode_ one_ Macroblock, inverse integer transform function iTRANS, deblocking filter function edgeloop, filter threshold calculation function get_ Strength and other functions. Below with 4 × 4 matrix inverse integer transform function iTRANS and 1 / 4 pixel interpolation filter_ Block (), which shows the performance improvement brought by assembly instruction optimization. four × The inverse integer transformation function iTRANS of 4 matrix adopts 2-level butterfly operation × 4. Perform row inverse transformation for each row of the matrix, and then perform column inverse transformation for each column. The one-dimensional transformation adopts the butterfly algorithm shown in Figure 2. The SIMD structure support vector operation of Blackfin Processor can complete up to four 16 bit addition operations in one cycle. Its parallel instructions can perform arithmetic operations and two data loading / storage operations at the same time. For example, the above butterfly operation can be realized by the following instructions (let register IO save the address of input data y, I2 save the address of coefficient array COF = {0x7FFF, 0x4000}, IL save the address of temporary variable TMP, and R2 and R1 save the intermediate results): R7=[IO++]; Al=R6.I*R7.1,AO=R6.1*R7.1(IS)┃│I R5= [10++]┃┃[││++]=R2; R4. H = (a1-r5.1 * r6.1), r4.1 = (AO + = r5.1 * r6.1) (is) │ w [I1 + +] = R1. H; R7.1=R6.1*R5.h(IS)1 W[11++]=R1.1; R5=R7》》》1(v); A1=R6.1*R5.h,AO—R6.1*R5.1(IS); R3. H I (a1 + r6.1 * r7.1), r3.1 I (AO = r6.1 * R7. H) (is); R2 = R4 + L + R3, R1 = R4 │ R3: Only 8 instructions are needed to complete a one-dimensional inverse transformation, including the cost of function call and some other auxiliary instructions to complete a 4 × A total of 82 instruction cycles are required for the inverse integer transformation of 4 matrix. Table 1 shows the comparison before and after optimization. get_ Block function performs 1 / 4 pixel interpolation on the pixel matrix. Firstly, the sixth order filter is used for 1 / 2 pixel interpolation, and then the linear interpolation method is used for L / 4 pixel interpolation. The calculation method of L / 2 pixel B is: B = round ((e-5f + 20g + 20h-5i + J) / 32). The schematic diagram is shown in Figure 3. E. F, G, h, I and j are integer pixels, and B is 1 / 2 of the pixel between G and H. The luminance value of pixels is of unsigned char type. Firstly, the luminance value of 8 pixels can be read to the register in one instruction cycle by using parallel instructions, then unpack 4 bytes into one register pair (R1: O or r3:2) by using video special instructions, and multiply and add twice in one cycle by using vector instructions. Through the use of video special instructions, vector instructions and parallel instructions, the number of instruction cycles of function instructions is reduced. 4 experimental results The decoder algorithm is tested on ezkit533 development board × 288) foreman test sequence, which can reach the decoding speed of 45 ~ 50 frames / S; The decoding speed of mobile test sequence in CIF format can reach 40 ~ 44 frames. If the decoding rate control module is added, the CIF test sequence can be stably played at the rate of 30 frames / s. The experimental results show that in Blac

     

     

     

     

    List all Question

    Nickname

    Email

    Questions

    Our other product:

    Professional FM Radio Station Equipment Package

     



     

    Hotel IPTV Solution

     


      Enter email  to get a surprise

      fmuser.org

      es.fmuser.org
      it.fmuser.org
      fr.fmuser.org
      de.fmuser.org
      af.fmuser.org ->Afrikaans
      sq.fmuser.org ->Albanian
      ar.fmuser.org ->Arabic
      hy.fmuser.org ->Armenian
      az.fmuser.org ->Azerbaijani
      eu.fmuser.org ->Basque
      be.fmuser.org ->Belarusian
      bg.fmuser.org ->Bulgarian
      ca.fmuser.org ->Catalan
      zh-CN.fmuser.org ->Chinese (Simplified)
      zh-TW.fmuser.org ->Chinese (Traditional)
      hr.fmuser.org ->Croatian
      cs.fmuser.org ->Czech
      da.fmuser.org ->Danish
      nl.fmuser.org ->Dutch
      et.fmuser.org ->Estonian
      tl.fmuser.org ->Filipino
      fi.fmuser.org ->Finnish
      fr.fmuser.org ->French
      gl.fmuser.org ->Galician
      ka.fmuser.org ->Georgian
      de.fmuser.org ->German
      el.fmuser.org ->Greek
      ht.fmuser.org ->Haitian Creole
      iw.fmuser.org ->Hebrew
      hi.fmuser.org ->Hindi
      hu.fmuser.org ->Hungarian
      is.fmuser.org ->Icelandic
      id.fmuser.org ->Indonesian
      ga.fmuser.org ->Irish
      it.fmuser.org ->Italian
      ja.fmuser.org ->Japanese
      ko.fmuser.org ->Korean
      lv.fmuser.org ->Latvian
      lt.fmuser.org ->Lithuanian
      mk.fmuser.org ->Macedonian
      ms.fmuser.org ->Malay
      mt.fmuser.org ->Maltese
      no.fmuser.org ->Norwegian
      fa.fmuser.org ->Persian
      pl.fmuser.org ->Polish
      pt.fmuser.org ->Portuguese
      ro.fmuser.org ->Romanian
      ru.fmuser.org ->Russian
      sr.fmuser.org ->Serbian
      sk.fmuser.org ->Slovak
      sl.fmuser.org ->Slovenian
      es.fmuser.org ->Spanish
      sw.fmuser.org ->Swahili
      sv.fmuser.org ->Swedish
      th.fmuser.org ->Thai
      tr.fmuser.org ->Turkish
      uk.fmuser.org ->Ukrainian
      ur.fmuser.org ->Urdu
      vi.fmuser.org ->Vietnamese
      cy.fmuser.org ->Welsh
      yi.fmuser.org ->Yiddish

       
  •  

    FMUSER Wirless Transmit Video And Audio More Easier !

  • Contact

    Address:
    No.305 Room HuiLan Building No.273 Huanpu Road Guangzhou China 510620

    E-mail:
    [email protected]

    Tel / WhatApps:
    +8618078869184

  • Categories

  • Newsletter

    FIRST OR FULL NAME

    E-mail

  • paypal solution  Western UnionBank OF China
    E-mail:[email protected]   WhatsApp:+8618078869184   Skype:sky198710021 Chat with me
    Copyright 2006-2020 Powered By www.fmuser.org

    Contact Us