Development of H.265 Player Based on Webassembly

"I, background introduction Be With the continuous update of the live broadcast technology in recent years, it is one of the important objectives pursued by the live industry, in this context, H.264 standard has become the mainstream of the industry, and the new generation The HEVC (H.265) standard is also increasingly widely used in the field of live broadcast. Pepper live broadcast has been studying, applying, and continuous optimization of HEVC (H.265). Be Be Second, technical research Be HEVC (H.265) Be High efficiency video coding, referred to as HEVC, also known as H.265 and MPEG-H Part 2, is a video compression standard, which is considered to be ITU-T H.264 / MPEG-4 AVC Standard successor. HEVC is considered not only to improve image quality, but also the compression ratio of H.264 / MPEG-4 AVC (equivalent to the same picture quality decreases to 50%). The following is a refermeter as H.265. Be Some major improvements to H.264 relative to H.264 include: Be 1. More flexible image block division Be H.265 divides the image into a more flexible "Coding Tree Unit, CTU", rather than the macro block (Micro Block) of 4 × 4 to 16 × 16, as H.264. The CTU utilizes a quadruple structure, which can be (recursively) to 64 × 64, 32 × 32, 16 × 16, 8 x 8 sizes of sub-regions. As the video resolution is from 720p, 1080p to 2K, 4K continuously, the H.264 relatively small macroblock division generates a large amount of redundant parts, while H.265 provides a more flexible dynamic area division. At the same time, H.265 uses a new encoding architecture consisting of coding unit (Cu), predict unit, PU), and transform unit, TUs, can effectively reduce yield. Be Be 2. More complete prediction unit Be Inframe prediction: The H.265 provides 35 intra prediction modes (PLANAR mode, DC mode, 33 angular modes), and provides better motion compensation in the 9 intra prediction modes provided by H.264. And vector prediction methods. Be Inter-frame prediction: Refers to the process of predicting the reference block from the adjacent image in the current image, and is used to remove time redundancy of the video signal. H.265 has 8 inter-frame prediction methods, including four symmetric division modes and four asymmetric divisions. Be 3. Better quality and lower rate Be H.265 Add a new "Sample Adaptive Offset" filter, including Edge compensation (EO, EDGE OFFSET), Belt compensation (BO, BAND OFFSET), Parameter Fusion Mode (Merge) for reducing distortion between the source image and the reconstructed image, and reducing the code rate. The test data indicates that although SAO uses SAO to increase the complexity of the codec to about 2%, but it can reduce the code stream of 2% to 6%. Be It can be seen that H.265 provides a higher compression ratio, a lower rate, better picture quality, also added complexity of codec, and statistics show that the amount of computation of H.265 decoded has already Several times in H.264. Be Since this is a practice of playing live streams for the web end, this paper focuses on the decoding portion. Be Be Hard solution and soft solution Be Decoding is usually divided into hard decodes and soft solutions. A hard solution refers to decoding by dedicated decoding hardware rather than a CPU, such as GPU, DSP, FPGA, ASIC chip, etc .; soft solution refers to decoding through the CPU running decoding software. In strict, there is no pure hard solution because the hard solution process still needs software to control. Be Although hardware decoding can achieve better performance, it is not popular in patent license and support hard decoding equipment (only some of the GPU supports H.265 hard solution on the current market). At the same time, as the computer CPU performance continues to increase, H.265 soft solution has begun to be widely used. Be Be WEB side soft decoding Be At present, the mainstream browser is not ideal for H.265, and the web side browser does not support H.265 native playback. The H.265 playback of the web end needs to be completed by software decoding. Be Be Soft decoding in the web is first you want to use JavaScript. Libde265.js is the JavaScript version of the open source H.265 codec libde265 (exactly the libde265 ASM.JS version, will be explained later). Tested, using libde265.js is not a sound and video playback of audio and video, there is a problem of low frame rate and sound and video. In addition, JavaScript acts as an interpreted scripting language, which is not an ideal choice for H.265 decoding this severe CPU-intensive computing task, so continues to explore better programs. Be Be Chrome Native Audio / Video Player Principle Be Find the Chromium Projects documentation, we can see the overall process for: Be Be Video Tag Creates a DOM object, instantiate a webmediaplayer player drives the buffer request multimedia data FFMPEG for unpacking and audio and audio video decoding to pass the decoded data to the corresponding renderer object to render the VIDEO tag display or sound card playback Be The purpose of video decoding is to decompress, restoring the video data into original pixels, and the sound decoding is to restore the MP3 / AAC and other formats into the original PCM format. Ffmpeg is an old, cross-platform audio and video processing tool, has a long history, outstanding performance, and has a large FFMPEG-based codec and player. You can see that Chrome also uses one of its decoders. According to the principle of native Audio / Video player, we can use ffmpeg to implement the playback of H.265. Be Ffmpeg has begun to support H.265 video from the early version 2.1, but the live broadcast is based on HTTP-FLV's H.265 video stream, and FFMPEG is not supported so far, "" Hevc over FLV (And Thus RTMP) "" "" "", Of course, this is certainly not because Ffmpeg has any problems, but because Adobe officials have not supported H.265 data in FLV. Be Be HTTP-FLV extension Be HTTP-FLV belongs to one of the three major live agreements (the other two is RTMP and HLS), as the name suggests, is to package audio and video data into FLV format and then transmitted via HTTP protocol. The HTTP-FLV delay is low, and the 80-port can penetrate the data stream protocol of the firewall, and HTTP 302 is supported to scheduling and load balancing. Be Above we mentioned that Ffmpeg officially does not support the encapsulation of H.265 data in FLV format, but the unofficial solution already exists, such as Domestic Manufacturer Jinshan video cloud has expanded FFMPEG, adding support for FFMPEG The codec function of the FLV package H.265 data. Thus, the extended FFMPEG can support H.265 data in the FLV format that decodes the HTTP-FLV live stream. Be But we know, ffmpeg is developed in C language, how to run FFMPEG on a web browser and enter the live stream data to be decoded? Using WebAssembly can solve our problem. Be Be WEBASSEMBLY Be Webassembly is a new coding method that can be run in a modern web browser - it is a low-level class assembly language with a compact binary format, providing a compilation target for other languages so that they can be on the web. run. It is also designed to coexist with JavaScript, allowing both to work together. In recent years, it has been widely supported by mainstream browsers: Be Be Before you understand the characteristics and advantages of Wasm, let's take a look at how JavaScript code is parsed and running in the V8 engine, which can be divided into the following steps (there will be between different JS engine or different versions of the engine. Some differences): Be The JavaScript code is converted from Parser to abstract syntax tree AST IGNITION to generate bytecode according to AST (there is no step before V8 engine V8.5.9, but directly into machine code, and the Ignition byte code interpreter after V8.5.9 will start by default. ) Turbofan (JIT) optimization, compile the byte code generation local machine code Be Be The first step in Generation AST, the more JS code, the longer the time, and the relatively slower link is also the entire process. Wasm itself is already bytecode, no need this part, so the overall running speed is faster. Be In step 3, since the data type of WASM is already determined, JIT does not need to assume the data type according to the information collected at runtime, and there will be no repeated optimization cycle. In addition, since the WASM is a node code, it will be much smaller than the JavaScript code that implements the same function (even after the compressed) volume. Be It can be seen that WASMs that achieve equivalent functions are better than JavaScript, both download speed or running speed. The ASM.js mentioned earlier is essentially JavaScript, and several steps are also subject to the above steps in the JS engine. Be So far, there have been many advanced languages to support compilation to generate WASM, from the earliest C / C ++, Rust to later TypeScript, Kotlin, Scala, Golang, even Java, C #, such a old-driven server-side language. The development language level supports WASM's situation so that the development of Webassembly technology is worth looking forward to it from the side. Be As we said, Webassembly technology can help us run FFmPEG in the browser, in fact, through the EMScripten tool to customize us on demand, cropped FFmpeg into a WASM file, load the web, interact with the JavaScript code. Be Be Be Third, practice program Be Overall architecture / flow diagram Be Be Technical stack Be Webassembly, Ffmpeg, Web Worker, WebGL, Web Audio API Be Be Key point description Be WASM is used to receive HTTP-FLV live stream data from JavaScript, decoding these data, and then transmits the decoded YUV video data and PCM audio data back to JavaScript by callback mode, and ultimately draw video screens on Canvas through WebGL At the same time, audio is played through the Web Audio API. Be Be Web worker Be Web worker provides a simple way to run scripts for web content in the background thread. Threads can perform tasks without interfere with the user interface. In addition, they can perform I / O using XMLHttpRequest. Once created, a Worker can send a message to the JavaScript code created to create it, publish the message to the event handler specified by the code. Be In the main thread, two Web Worker, Downloader, and Decoder are used for drawing and decoding, where Decoder communicates with the WASM, three threads communicate with the postMessage, using the transferable object when transmitting stream data, only the reference , Not copy data, improve performance. Be Downloader uses the Streams API to pull the live stream. Fetch pulls streaming data and returns a ReadableStreamDefaultReader object (default), which can be used to read a CHUNK from a stream. The object's read method returns a promise object, which can continue to get a set of {done, value} values through this promise object, where DONE indicates whether the current stream has ended, if not end, Value.Buffer is drawn this time The binary data segment, this data is sent to the Decoder via PostMessage. Be Decoder is responsible for sending the original to-be decoded data with a WASM compiled by ffmpeg.Received decoded data. When sending raw data to the WASM, put each data segment into a Uint8Array array, assign the equivalent Buffer memory space with Module._malloc, and write this data into the memory space points to buffer by module.heapu8.set. In, finally, the buffer is passed to WAFFER to WASM together with the length of this data. When receiving the decoded data from the WASM, the two Callback methods are received by the video data callback and audio data defined in the decod, will then be transmitted to the main thread by postMessage. Be Audio decoding is done in the AudioQueue queue of the main thread, and the video decoding will be placed in the main thread Videoqueue queue, waiting for the main thread to read. The role is to ensure smooth playback experience, and audio and video synchronization processing. Be Be Ffmpeg Be Ffmpeg is mainly composed of several lib directories: Be Libavcodec: Provide codec function Be Libavformat: Package (MUX) and Unexpected Packaging (DEMUX) Be Libswscale: image telescopic and pixel format transformation Be First, use libavformat's API to unpack the container to obtain information such as the location of the audio video in this file, and then use libavcodec to obtain image and audio data. Be Be YUV video data presence Be YUV's sampling mainly there is YUV4: 4: 4, YUV4: 2: 2, YUV4: 2: 0, respectively, each Y component corresponds to a set of UV components, each two Y components share a set of UV components, every four The y component share a set of UV components, and the code stream required for YUV4: 2: 0 is the lowest. Be The arrangement of YUV data includes two formats of Planar and Packed. The PLANAR format YUV sequentially stores the Y, U, V data of the pixel point; the YUV of the PACKED format alternately stores Y, U, V data of each pixel point. Be Here we decode the video data is YUV420P format, but Canvas cannot directly render data in the YUV format, and can only receive data in RGBA format. Converting YUV data to RGBA data, consumes a range of performance. We use WebGL to process YUV data to render to Canvas, which can omit the overhead of data conversion, and utilize the hardware acceleration function of GPUs to improve performance. Be Be Memory ring / ring buffer (Circular-buffer) Be Live stream is a data source that continuously transmits, unknown total length, and the data taken is temporarily stored before being read by the Decoder Worker. After reading, it will be timely clear or override, otherwise the client will be occupied. Multi-memory and disk resources. Be A possible method is to write to a ring-wide data segment to a ring memory, pointing to the memory start address of the Decoder each decoded each decoded, and then point to the subsequent manner with a TAIL pointer The memory address written in the flow data segment, and the decoding is moved backwards to move the two pointers pointing positions, so that the streaming data can be constantly written in this memory ring, decoded, covered, make the overall Memory usage is controllable, and too many resources will not be consumed during the live broadcast. Be Be FFMPEG Custom Data IO Be FFMPEG allows developers to customize data IO sources, such as file systems or memory. Use memory in our scheme to send to FFMPEG to be decoded, which is created by avio_alloc_context, and the AvioContext structure is defined as follows: Be Be Buffer is a pointer to a custom memory buffer; Be Buffer_size is the length of this buffer; Be WRITE_FLAG is identified to write data to memory (1, encoding) or other, such as reading data from memory (0, decoding); Be Opaque contains a set of operation pointers to custom data sources, which are optional parameters; Be READ_PACKET and WRITE_PACKET are two callback functions, which are used to read and write from custom data sources. Note that these two methods are cyclic calls when the data is processed is not empty; Be Seek is used to specify the byte position specified in the custom data source. FFMPEG Decoding the processing by custom IO reading data is shown below: Be Be WASM volume optimization Be FFMPEG provides a package / decoding support, encoding / decoding support for a large number of media formats, and support for various protocols, color space, filter, hardware acceleration, etc., you can use the ffmpeg command to view the specific information of the current FFMPEG version. Be Be Since we mainly practice the H.265 decoding, you can customize FFMPEG only support only the necessary decoders when compiling. Different from routine compilation FFMPEG ./configure, you need to use Emconfigure ./configure provided by EMScripten when compiling for FFMPEG calls for WASM. Be Be This is customized FFMPEG version, with the decoder C file consolidation, the WASM size is 1.2m, which is 1.4m more than 1.4m before the optimization, and lift the loading speed. Be Be Fourth, practice results Be Implementation of pepper web end H.265 live stream decoding playback. Tested, on the MacBook Pro 2.2GHz Intel Core i7 / 16g memory notebook, using the Chrome browser for a long time, the memory usage is stable between 270m to 320m, and the CPU usage is between 40% and 50%. Be Be V. Main reference materials or websites Be Ffmpeg official website (http://ffmpeg.org/) About FFMPEG does not support HTTP-FLV / RTMP discussion (http://trac.ffmpeg.org/ticket/6389) Webassembly official website (https://webassembly.org/) Google V8 Engine (https://v8.dev/) EMScripten official website (https://emscripten.org/)