FMUSER Wirless Transmit Video And Audio More Easier !

[email protected] WhatsApp +8618078869184
Language

    Implementation and optimization of H.264 encoder based on TMS320DM6446

     

    Implementation and optimization of H.264 encoder based on TMS320DM6446 1 Introduction H. 264 is a video compression standard jointly formulated by the video coding expert group (VCEG) of ITU-T and the moving picture expert group (MPEG) of ISO / IEC. It is developed on the basis of H.263 / H.263 + +. While inheriting the advantages of all coding and compression technologies, it introduces many new coding technologies and the concept of network adaptation layer nal, so it has higher coding efficiency and better network adaptation. It provides an excellent general tool for video compression coding from low bit rate real-time communication system or wireless environment to high bit rate HDTV and digital storage system. However, the excellent performance of H.264 standard is at the cost of coding complexity and large amount of computation. It will occupy large CPU and memory resources when implemented on a general PC platform. With the rapid development of digital signal processor (DSP) technology, the processing speed and ability of DSP are improving rapidly. DSP has met the encoding and decoding operation speed requirements of H.264 standard. Therefore, the implementation of H.264 standard on a stable media processor platform has good engineering significance and application prospects. This paper introduces the hardware design of video coding system based on TMS320DM6446 DSP in detail, and focuses on the transplantation and optimization of H.264 encoder on CCS platform aiming at TMS320DM6446. 2 hardware design of video coding system 2.1 selection of DSP DSP adopts the special device TMS320DM6446 (DM6446) of DaVinci media processing of TI company. It adopts arm + DSP dual core architecture, including a tms320c64x + core and an ARM926EJ-S core. The C64x + core adopts an improved VLIW ARCHITECTURE with 8 parallel computing units, a clock frequency of 600 MHz and a peak processing capacity of 4752 mi / s. DM6446 has a two-level cache structure on chip, and is designed with an independent 32-bit DDR2 SDRAM interface and a 16 bit asynchronous EMIF interface. In addition, DM6446 also integrates a variety of on-chip resources and interfaces suitable for video and audio multimedia processing, such as video processing front-end module vpfe connected with external decoder, video processing back-end module vpbe connected with video display device, multi-channel audio serial port, etc. DM6446 not only fully meets the requirements of H.264 standard in processing performance. Moreover, the video processing application is specially optimized in the internal structure, on-chip resources and external interface, which greatly reduces the development difficulty and cost of video application. 2.2 system structure block diagram The hardware structure principle block diagram of video coding system is shown in Figure 1. The host initializes the DSP through the PCIe bus and loads the program. The analog video signal output by the camera is converted into digital signal by the video decoding module, and the level is converted by FPGA. It is sent to DSP through the vpfe module interface of DM6446 for compression coding processing. The encoded video data is output from the EMIF interface of DM6446 and sent back to the host through PCIe bus for further processing. The vpbe module of DM6446 can convert the collected digital video signal into analog signal and output it to TV for monitoring. DDR2 SDRAM stores the original image, reference frame, coding parameters and other data in the coding process. DM6446 configures a / D converter through I2C bus. Dual port RAM is added between FPGA and PCIe bridge pex8311 to improve data transmission efficiency. 2.3 design of video decoding module There are many kinds of transmission formats of analog video signals, and there are clear international standards for the transmission format of digital video signals. Therefore, the general a / D converter is not suitable for video applications. Here, the special video decoder adv7189b is selected, which supports 12 analog video channels and includes three 12 bit 54 MHz A / D converters with anti-noise performance. It supports analog video signal input of CVBS, s-terminal and yprpb formats, can automatically detect ntsl / PAL / SECAM system, and output digital video signal of ITU-R BT.656 standard. Three of the 12 analog channels are selected and multiplexed to support three analog video formats. Adv7189b outputs 10 bit digital video signal, independent vertical synchronization signal VD, horizontal synchronization signal HD and pixel synchronization clock LLC1. The voltage is 3.3 V, which is converted to 1.8 V required by DM6446 through FPGA, and then sent to DSP from the special digital video signal interface of vpfe module of DM6446. Before compression coding, the vpfe module converts the video data of ITU-R BT.656 standard into H.264 compatible yuv4:2: O format and stores it in DDR2 SDRAM. Vpfe module also supports preprocessing operations such as white balance and scaling of video data. Adg3301 realizes the level conversion of I2C bus. 2.4 video coding module design The vpbe module in the DM6446 chip contains four 54 MHz D / A converters, which can directly convert digital video signals into analog video signals in the DM6446, with four outputs, and supports three analog video formats: CVBS, s-terminal and yprpb. Therefore, the design of video coding module is relatively simple. It can be directly connected with monitoring equipment by amplifying 4-channel analog output signals. The voltage feedback CMOS operational amplifier opa357 of TI company is selected for operational amplification. 2.5 control circuit design The video signal interface and EMIF interface of DM6446 are at 1.8 V level, and the adv7189b interface and PCIe bridge interface are at 3.3 V level. The system needs a lot of level conversion work. At the same time, it also needs to realize a lot of logic control and the communication protocol between PCIe bridge and DM6446. FPGA device is the most suitable choice. The logic device ep2c35 of Altera company is selected, which can realize the conversion of 1.8 V, 2.5 V and 3.3 V levels in the chip, and can meet the requirements of the system for logic control function. Ep2c35 is internally integrated with on-chip memory, which can establish a buffer between adv7189b and DM6446 to improve data transmission efficiency. The interface circuit between FPGA and DM6446, adv7189b and PCIe bridge is shown in Figure 2. 3. DSP transplantation and optimization of H.264 encoder At present, the implementation versions of H.264 encoder mainly include JM, t264 and x264. JM is the official source code of H.264, which realizes all the features of H.264, but its program structure is lengthy. It only considers the introduction of various new features to improve the coding performance, ignoring the coding complexity, which is extremely complex and not practical; The t264 encoder encodes and outputs the standard 264 code stream, and the decoder can only decode the code stream generated by the t264 encoder; X264 encoder pays attention to practicality and tries to reduce the computational complexity of coding without significantly reducing the coding performance. Here, the DSP platform is transplanted and optimized with x264 encoder. The implementation and optimization of x264 program on DSP platform mainly include program simplification, code transplantation and code optimization. 3.1 procedure simplification In addition to supporting the basic level of H.264, x264 encoder also includes some function options and other function modules of the main level. The code size is large. Therefore, unnecessary function modules need to be deleted to reduce the code size. Mainly delete the following: delete the decoding part in the x264 program and the CABAC and B slice parts outside the basic level function; X264 program is a PC platform based on X86, including SSE, MMX, etc. The optimization technology used in PC platform is invalid under DSP platform: adjust the deleted code file structure according to the characteristics of DSP platform. 3.2 code migration Ti's DSP development tool CCS has its own ANSI C compiler and optimizer, and has its own syntax rules and definitions. After the simplification in the previous step, the pure c version of x264 encoder needs to be modified to be applied to specific DSP under CCS. It mainly includes: ① different treatments of "repeated definition" of variables and structures by visual c + + and CCS, and the location of variable and structure definitions in the header file needs to be changed; ② Replace the library functions not in CCS with library functions with the same functions, such as strncasecmp(); ③ For different data formats, long is used to replace those not in CCS_ Int64 format; ④ Define the array according to the rules of C language under CCS; ⑤ Modify the reading mode of system configuration parameters; ⑥ Write CMD file for TMS320DM6446 storage structure. In this way, x264 can be compiled and run under CCS. 3.3 code optimization The pure c version of x264 program does not use the resources and parallel mechanism of DM6446, and the code running speed is very low. Therefore, the code must be optimized to improve the processing performance. There are three levels of x264 code optimization: project level optimization, algorithm level optimization and instruction level optimization: (1) Project level optimization project level optimization is mainly used to select, match and adjust various compilation parameters provided by CCS, such as options -o3, - PM, etc; Use the optimization function provided by CCS compiler to improve the performance of loop and multi loop body, carry out software pipelining and improve the parallelism of software; Rewrite the statements that are not suitable for compiler optimization, so that CCS can better optimize the program. (2) Algorithm level optimization. The pure c version in the VC environment should be updated synchronously with the version under CCS. The correct operation of the VC version can not only ensure the correctness of the algorithm in theory, but also speed up the work speed and reduce the occurrence of problems. The algorithm optimization work mainly includes the following points: ① selection of motion estimation method: x264 encoder provides three optional whole pixel motion estimation methods: x264_ ME_ ESA (full search), x264_ ME_ Hex (hexagon search method), x264_ ME_ DIA (small diamond search method). In the VC environment, the pure c version code is used to encode the same video sequence using three different search methods. The performances of three search methods in coding speed, peak signal-to-noise ratio (PSNR) and bit rate are compared. By contrast, x264_ ME_ The peak signal-to-noise ratio of ESA algorithm is the highest, x264_ ME_ Hex followed by x264_ ME_ Dia is the lowest, but the quality difference between them is not large, and the bit rate difference is also small, but there is an obvious difference in coding speed. X264_ ME_ Dia has obvious advantages in coding speed compared with the first two. After comparison, x264 is selected_ ME_ Dia motion estimation algorithm. ② Improvement of intra prediction mode: conditions for early termination mode selection are added to the intra prediction process of x264 to improve the algorithm process. Proceed 16 × 16. During macroblock intra mode search, 16 is terminated when the overhead of the current mode is less than half of the minimum overhead of the searched mode × 16 intra prediction mode selection, with the current mode as the best 16 × 16 intra prediction mode. Right 4 × 4 blocks also add the same conditions, and if the current 4 × The prediction overhead of the 4-block intra prediction mode is 16 × If the overhead of 16 block intra prediction mode is less than 1 / 16, terminate 4 × Intra prediction mode selection of 4 blocks, with the current prediction mode as the best 4 × Intra prediction mode of 4 blocks. The main flow of the improved intra prediction is shown in Fig. 3, and the gray part is the judgment condition for adding. Improvement of inter prediction mode: the current 16 × 16 macroblocks are divided into 4 8 × 8 macroblocks, respectively predict their motion vectors, and then take two 8 blocks adjacent to the left and right, up and down × Based on the comparison between the difference of motion vectors of 8 blocks and the threshold, it is determined whether to perform 16 × 8、8 × Finally, the partition mode with the lowest overhead is selected as the best inter frame partition mode. (3) Instruction level optimization DM6446 can run 8 instructions in parallel in a clock cycle, access 64 bit data at one time, has 64 32-bit general registers, and supports the operation and processing of 4 8-bit bytes or 2 16 bit bytes in the registers, which makes DM6446 have strong parallel operation ability. The pixel size of video image is generally a multiple of 4. The pixel value in x264 is regularly stored in matrix form with 8-bit or 16 bit data. This data storage structure is very consistent with the parallel processing mode of DM6446. Therefore, optimizing the instruction of x264 program and giving full play to the parallel computing ability of DM6446 is the key to improve the encoder speed. It is mainly divided into the following two parts: ① using inline function optimization; C6000 compiler provides many inline functions intrinsics, which are online functions mapped by assembly instructions. Assembly instructions that are not suitable for C language have corresponding intrinsics functions. In this way, the inline function can be directly used in the C language structure to realize the parallel operation of multiple data. For example, before using inline function optimization, x264 program calls bilinear interpolation function once, which can only calculate the value of one sub-pixel point, and uses inline function_ mem4()、_ After avgu4 () and other optimization, the values of four sub-pixel points can be calculated at one time, which greatly improves the operation speed. ② Optimization using linear assembly language: because linear assembly does not need to consider register allocation, instruction delay, parallel instruction arrangement and other factors. Therefore, the profile analysis tool provided by CCS can be used to extract the functions with high frequency and time-consuming. According to the previously known correlation between data and other information, the function assembly can be rewritten directly in the program and optimized manually. The algorithms involved are: Calculation of sad and SSD; DCT transformation; Inverse DCT transform, sub-pixel search, etc. 4 experimental results The representative video sequences Carphone (character motion range is large), news (background changes, character motion range is small) and container (background is simple, scene motion is slow) are selected for coding. The video is in YUV 4:2:0 format. QCIF, the quantization step is set to 26, a total of 50 frames, and the ippp... Coding mode is adopted. The clock frequency of the DM6446 is 600 MHz. Table 1 shows the optimized peak signal-to-noise ratio, consumed clock cycle, bit rate and other experimental results. Table 2 shows the comparison of coding clock cycles before and after optimization. The average coding speed of I frame is increased by 9 times and that of P frame is increased by 11 times. Take video Miss America as an example to study and compare the image compression quality of the transplanted optimized encoder under different quantization step values (QP), as shown in Figure 4. 5 Conclusion The transplanted optimized x264 encoder can encode correctly in CCS environment

     

     

     

     

    List all Question

    Nickname

    Email

    Questions

    Our other product:

    Professional FM Radio Station Equipment Package

     



     

    Hotel IPTV Solution

     


      Enter email  to get a surprise

      fmuser.org

      es.fmuser.org
      it.fmuser.org
      fr.fmuser.org
      de.fmuser.org
      af.fmuser.org ->Afrikaans
      sq.fmuser.org ->Albanian
      ar.fmuser.org ->Arabic
      hy.fmuser.org ->Armenian
      az.fmuser.org ->Azerbaijani
      eu.fmuser.org ->Basque
      be.fmuser.org ->Belarusian
      bg.fmuser.org ->Bulgarian
      ca.fmuser.org ->Catalan
      zh-CN.fmuser.org ->Chinese (Simplified)
      zh-TW.fmuser.org ->Chinese (Traditional)
      hr.fmuser.org ->Croatian
      cs.fmuser.org ->Czech
      da.fmuser.org ->Danish
      nl.fmuser.org ->Dutch
      et.fmuser.org ->Estonian
      tl.fmuser.org ->Filipino
      fi.fmuser.org ->Finnish
      fr.fmuser.org ->French
      gl.fmuser.org ->Galician
      ka.fmuser.org ->Georgian
      de.fmuser.org ->German
      el.fmuser.org ->Greek
      ht.fmuser.org ->Haitian Creole
      iw.fmuser.org ->Hebrew
      hi.fmuser.org ->Hindi
      hu.fmuser.org ->Hungarian
      is.fmuser.org ->Icelandic
      id.fmuser.org ->Indonesian
      ga.fmuser.org ->Irish
      it.fmuser.org ->Italian
      ja.fmuser.org ->Japanese
      ko.fmuser.org ->Korean
      lv.fmuser.org ->Latvian
      lt.fmuser.org ->Lithuanian
      mk.fmuser.org ->Macedonian
      ms.fmuser.org ->Malay
      mt.fmuser.org ->Maltese
      no.fmuser.org ->Norwegian
      fa.fmuser.org ->Persian
      pl.fmuser.org ->Polish
      pt.fmuser.org ->Portuguese
      ro.fmuser.org ->Romanian
      ru.fmuser.org ->Russian
      sr.fmuser.org ->Serbian
      sk.fmuser.org ->Slovak
      sl.fmuser.org ->Slovenian
      es.fmuser.org ->Spanish
      sw.fmuser.org ->Swahili
      sv.fmuser.org ->Swedish
      th.fmuser.org ->Thai
      tr.fmuser.org ->Turkish
      uk.fmuser.org ->Ukrainian
      ur.fmuser.org ->Urdu
      vi.fmuser.org ->Vietnamese
      cy.fmuser.org ->Welsh
      yi.fmuser.org ->Yiddish

       
  •  

    FMUSER Wirless Transmit Video And Audio More Easier !

  • Contact

    Address:
    No.305 Room HuiLan Building No.273 Huanpu Road Guangzhou China 510620

    E-mail:
    [email protected]

    Tel / WhatApps:
    +8618078869184

  • Categories

  • Newsletter

    FIRST OR FULL NAME

    E-mail

  • paypal solution  Western UnionBank OF China
    E-mail:[email protected]   WhatsApp:+8618078869184   Skype:sky198710021 Chat with me
    Copyright 2006-2020 Powered By www.fmuser.org

    Contact Us