Web end H.265 player research and development decryption
"The audio and video codec is a relatively small area, involving the text, graphics, image, audio, and video of the text, graphics, image, audio, and video in streaming media technology can be applied to specific practice. The multimedia field has more than two years, and it is necessary to have a certain output. We have a self-developing web player and support H.265 decoding. In the context of the code rate optimization (maintain the image quality, apply image enhancement, ROI area Detection, intelligent scene classification, and H265 codec can reduce the code stream 50%. Decrease bandwidth cost, improve video service QoE's purpose), and truly achieve the whole area coverage of H265 decoding.
Player overall architecture
Our player's overall architecture is designed as follows:
This article mainly shares the solution, decoding, and playback of our WEBASSEMBLY implementation H.265 format.
background
H.265 is also known as HEVC (full name High Efficiency Video Coding, high efficiency video coding), is the successor of the ITU-T H.264 / MPEG-4 AVC standard. Compared to H.264, H.265 has a higher compression ratio, it means that the same rate (also known bit rate refers to the number of bits (BIT) per second. The unit is BPS (Bit Per Second), bit The higher the rate, the more data is transmitted per second, the clearer the picture quality), the picture quality of the H.265 will be clearer, the higher compression rate can use lower storage and transmission costs.
Bandwidth Cost: H. 265 at a limited bandwidth can transmit higher quality network video, in theory, H.265 is maximum only H.264's half bandwidth can transmit the same quality video. Lower bandwidth can better reduce storage and transfer costs, and make a pavement for more complex and fun interactive gameplay for future video and live broadcast fields. Transcoding cost: But the current mainstream browser does not support H.265 native video playback, so usually, the video production side needs to make a transcodation of H.264 video for the browser to adapt the browser, such as the PC scene, and Added transcoding costs. For example, in Taobao live, it is assumed to calculate 50,000 games per day, each live transcoding cost is 20 yuan, one day is 1 million transcoding costs. To this end, our team has explored the feasibility and compatibility of the browser-end H.265 video playback, preparing for the full amount of H.265 of the mobile terminal and the PC side, and also conducts the browser end audio processing, WebAssembly practice An in-depth attempt.
H.264 vs H.265
H.264 is the most widely used video coding format, and the H.265 standard surrounds the existing video coding standard H.264, reserves the original technologies, and improves some related technologies. New technology uses advanced technology to improve the relationship between code stream, coding quality, delay and algorithm complexity, reaching optimization settings. Both H.265 and H.264 are block-based video coding techniques, the main difference is the size of the coding unit and some encoding algorithm details, and the H.265 divides the image as "Coding Tree Unit, CTU", Instead of 16 × 16 macroblocks like H.264. According to different coding, the size of the coding tree unit can be set to 64 × 64 or limited 32 × 32 or 16 × 16. In general, the larger the block size, the better the compression efficiency. The specific algorithm and related details are not specifically expanded here, and some other compression algorithms such as the open coding formats such as AV1, such as the H.265 patent restriction, readers can refer to other related articles.
As shown below, you can see that the same subjective picture quality, H.265 (500K) requires only H.264 (800K) with a bandwidth code rate.
Browser status
As shown below, because H.265 patent and hard solution support is not perfect, mainstream modern browsers are not compatible with H.265 encoded video playback (EDGE new version is supported by plug-in mode, "but because Apple is H.265 Support (here the author thinks this may be a very important sign, because the development of technology is not just the decision of this technology itself, but many factors work together, business is also important factor), mobile iOS Safari supports native playback at version 11.0 or more.
Want to play the H.265 video native tag in the browser side, but because the video format itself is a collection of continuous image screens and audio, refer to the source code of Chromium and the implementation principle inside the video tag, The combination of + Web Audio API simulates a virtual VIDEO tag to implement the player function.
Demo
Because of the performance of live stream, I posted a play H.265 MP4 video (the video address played directly in the browser and there is only a sound without the picture), the reader can have an intuitive experience. Address: https://g.alicdn.com/videox/mp4-h265/1.0.2/index.html effect:
Preliminary investigation
VioF foundation
Because there are not many scenes involving the video field, a tag can meet most of the scene, but experience the outbreak of live broadcast and short video in the past few years, the demand and function of video becomes more complex. I have read a lot of books and articles related to the audio field before developing, and a brief introduction to the video audio foundation first.
The format of our usual video, such as. MP4, .mov, .wmv, .m3u8, .flv, etc. are called Container. In a video file, video data is stored separately, and the compression algorithm used is different. The container mainly includes video data, AUDIO data, metadata (for retrieving information such as Victorial Payload format). The package format of each format is different, such as the basic unit of the FLV format is TAG, and the basic unit of the MP4 format is Box, and the auxiliary META information is used to retrieve the corresponding raw data.
The video coding criteria such as H.264, H.265, which usually hear, is called CODEC (Compress and Decompress). A video format such as MP4 can use any standardized compression algorithm, which is included in the meta information of a video file to tell the player what codec algorithm is played.
Client player
A traditional client player playing a video stream has passed the following links:
Take data => Decoding => audio and video decoding => Sampling / pixel data is sent to the device for rendering.
For streaming media, the player client performs pipelined transmission by drawing a data source (audio and video stream). During this period, a series of operations such as reading, conversion, classification, and copying of video streams, with the packaged MP4 stream as an example, the steps of demarchers, decoding, rendering, etc. are required:
Browser Video Label
In the process of exploring, in order to understand the reason for the mainstream browser does not support H.265 video playback, and the browser is realized by the browser principle, through the reading of the official document and the VIDEO tab of the Chromium, the reading of the source code is organized. flow chart.
You can see the implementation of the video stream playback inside the browser, which is synchronized and rendered by the FFMPEG soft solution or GPU hard solution after processing the PIPELINECONTROLLLER. The H.265 video is not perfect because of hard solution support, and the soft solution may have performance risks, so it is closed in Chrome, which can be opened by parameters in chromium. In accordance with this idea, we use the interface provided by the browser to implement an analog VIDEO tag.
designing process
Development ideas
The development idea is split from the simple to complex process, and the task is subjected to the coverage of the various scenes such as H.265 video on demand and live broadcast. The playback process is complete in MP4 short video, then cover live scenes, consider the network jitter, Complex factors such as memory control, playback of playback files such as live M3U 8 and develop video SEEK, multiplier and other functions.
MP4 play => FLV play => HLS play => Add seek, multiplier and other functions
Feasibility Analysis
Idea: When the feasibility analysis is started, the reference combined with the tools VideoConverter.js and libde265.js to extract the H.265 video FFmpeg's compilation, the HEVC file and the MP3 audio file are played on the browser. Demo Address: https://sparkmorry.github.io/mse-learning/H265: Performance: Separation of 720p MP4 video and audio, draw images by , play audio via tag, The picture is around 23fps under the Chrome browser on the MacBook Pro. problem:
Cannot meet the decoding performance standard: 720p video is only around 23fps on the MacBook Pro, and the original video is 25fps, and the decoding performance standard cannot be achieved and cannot be played smoothly. Unable to do a synchronization: This program cannot obtain a strict sounds synchronization because the PTS timestamp of the video and audio cannot be obtained directly by directly extracting the HEVC bare stream file. solution:
Performance: Because libde265.js is ASM.JS, through the transformation of libde265.js open source library, package WebAssembly test performance scenario synchronization: Refer to FLV.JS, HLS.JS, etc. Open source video library, according to the previous practical experience The performance of JS in the decoder can complete the video stream file decoration, obtain each frame video, audio playback PTS and raw data hypothesis decoder for decoding rendering. Solution adjustment:
MP4 diverted stream play
Idea: According to the solution adjusted by the previous process, the MP4 stream is decorated by JS because the complexity of the audio decoding is not high, and it is first decoded with JS, only the video decoding module uses the existing triplet module libde265 and replaced To resolve performance issues for WASM, audio and video decoding modules are self-maintained a caching area, responsible for storing Packet data passed by the encapsulation module, solving the problem of synchronization of the sounds. Performance: The video decoding module implemented by the open source libde265 is targeted to the 720P video stream, the average decoding time is 45ms, and the audio play time interval (40ms) is not met. Problem: Video decoding performance is still not enough. solution:
Frame: Social synchronization is guaranteed, and some non-reference frames are lost, but partial experience is lost. So improving the decoding performance and improving playback strategy can meet the feasibility of the current program. Enhance decoding performance and improve playback strategy. Enhance the decoding performance: FFMPEG with better decoding performance is replaced by libde265. Improve playback processes: Because each RequestaniMationFrame loop task is synchronized, the decoding is played while playing. Introduced with the Webworker thread. By improving the video decoding module, the interior of the decoder is turned on, and when the external video playback device needs to play the next frame, the next frame data is read directly from the decoder decoded. Implementation of Worker and main threads. Solution adjustment:
Demo Address: https://static-assets.cyt-rain.cn/h265/index.html design process
FLV live stream playback
Idea: MP4 video smooth play, but in live broadcast scenarios (such as FLV video stream), the client needs to establish a long link, constantly receive flow messages, borrow the FFMPEG itself to convective media support, to encapsulate and decode video data . Performance: The FFMPEG network library cannot be compiled, and TCP cannot establish a connection. Question: Unable to compile FFMPEG network libraries: TCP Establish Connection Create a Socket Time Fair, EMScripten Tools Unable to compile TCP connection related configuration
CODEC does not support: The FLV official agreement does not support H.265.
solution:
Unable to compile FFMPEG network libraries: The main thread uses the FETCH method to draw, put it in the FFMPEG custom buffer for decoding and decoding. Because the live broadcast of time is playing, the memory space is released, and the memory space is released, and the annular data buffer is used. The FLV official agreement does not support H.265: H.265 is expanded to the FFMPEG and the encoding end because the internal data structure of the FFMPEG is nestled deeply, and the replacement JS decoction function directly uses FFMPEG's decapsulation function. Solution adjustment:
Design Flow
Current plan
Playback process
Because FFMPEG supports multiple format decoders, only need to pull the original stream data through the browser API in the main thread and put it on the Worker for unpacking when it is initially cached to a threshold. Decoding; in the Subroutine (Worker), data callback receiving data is triggered by the main thread fetch method, in the ring buffer (memory loop); the sub-thread will be sent to the main thread, through the Web Audio API. Cache audio data, decoded to the decoded video frame cache queue cycle decoded to cache 10 frames RGBA image data; the main thread is consumed according to the PTS of the audio playback, and renders the video image; looped or more operation until the FETCH interface returns end.
Decoder compile
The FFMPEG library written in the C language is translated into a WASM through the EMScripten tool. It is applied to the video audio decoding in the browser. We have achieved the work of decoding and decoding by relying on FFMPEG's universal interfaces in our video decoding scenario. First compile the FFMPEG library through the EMScripten, rely on a static library to decompose in the decoder and decoded inlet program..
Test performance
Performance Testing
Test video
Because FLV live video is affected by timeliness, take 720P HD H.265 MP4 video as a stable input test
Address: https://gw.alicdn.com/bao/uploaded/lb1l2ixiszqk1rjszfjxxblcfxa.mp4? File = lb1l2ixiszqk1rjszfjxxblcfxa.mp4 video parameters:
Input # 0, mov, mp4, m4a, 3gp, 3g2, mj2, from 'https://gw.alicdn.com/bao/uploaded/LB1l2iXISzqK1RjSZFjXXblCFXa.mp4?file=LB1l2iXISzqK1RjSZFjXXblCFXa.mp4': Metadata: major_brand: isom minor_version: 512 compatible_brands: isomiso2mp41 encoder: www.aliyun.com - Media Transcoding Duration: 00: 01: 00.10, Start: 0.000000, Bitrate: 907 KB / S Stream # 0: 0 (Und): Video: HEVC (Main) (HVC1 / 0x31637668), YUV420P (TV, BT709, Progressive), 1280x720, 854 Kb / s, 25 fps, 25 TBR, 12800 TBN, 25 TBC (Default) metadata: handler_name: VideoHandler Stream # 0: 1 (Und): Audio: AAC (Lc) (MP4A / 0X6134706D), 44100 Hz, Stereo, FLTP, 48 KB / S (Default) metadata: Handler_name: SoundHandler
Testing Machine
Lenovo ThinkPad T430
CPU: Intel (R) Core (TM) i5-3230M [email protected] X64 processor memory: 8 GB system: Windows 10 MacBook Pro (Retina, 15-Inch, MID 2015)
CPU: 2.2 GHz Intel Core i7 Memory: 16 GB System: MacOS 10.14.2
Performance
MBP performance
Decoder.wasm size decoder.js size average per frame decoding time memory occupied CPU occupied 1.4M168K26.79MS27M17 ~ 25%
Tested for two PC notebooks, the average period per frame decoding (including YUV420 RGBA) The performance of each browser is as follows: Note: This Native indicates FFMPEG-based FFMPEG for the Mac system as a dependent decoder. (Relative to the cross-platform scheme of the WebAssembly of the computer architecture such as X86, ARM) is considered.
Equipment ChromesafarifirefoxedgenATIVEMACOS (I7) 26.79ms22.19ms24.77ms-5.08msWindows (I5) 33.51MS-36.74MS86.72MS untested
It means that the highest can provide 720P HD video, the following frame rate video smooth playback is:
Equipment Chromesafarifirefoxedge Video Benchmark NativeMacos (i7) 37FPS45FPS40FPS-25FPS196FPSWINDOWS (I5) 30fps-27fps12fps25FPS untested
You can see that in these two machines, the HD 720p video of 25FPS frame rates such as non-high-speed motion such as high-speed sports has been able to achieve standards for the production environment, but there is a distance from the native speed.
Browser compatibility
Mainly supported by Webasembly and Webworker, the mainstream browser Chrome, Safari, Firefox, and EDGE can be tested in the actual test.
WebassemblyWebWorker
Todo
The current technical solution has been able to play 720p high-definition live stream on most machine mainstream browsers, but in the EDGE browser and a slightly poor performance, there is still a high-definition video decoding performance that does not meet the risk of smooth playback. Webassembly achieves a distance of Native speed, especially the support of compilation parallel computing, is a very common performance optimization strategy in viewing audio and large-scale data processing, the author organizes several optimized directions, and more Multi-exploration space:
Compilation FFMPEG solutions have more optimizations for parallel compilation using assembly, but assembly instructions are CPU SPECific (such as X86 instructions and ARM instructions), and WASM is a stack-based virtual machine across platforms. Emscripten does not support compilation compilation, consider compiling FFMPEG's .c and assembly .asm files into LLVM IR (LLVM Intermediate Reperesent) with CLANG and other LLVM front ends, then compile testing through FastComp or other rear ends. Hard solution FFMPEG3.3 began supporting automatic hard solution detection, supported hardware devices have different support according to different operating systems and hardware, specific reference: https: //trac.ffmpeg.org/wiki/hwaccelintro. Because WASM is a cross-platform virtual instruction set, the level of support is still in further exploration. Multi-thread FFMPEG internal decoding multi-threaded to improve decoding performance, can support cross-platform multi-threaded support through PThread, but if shared memory is not supported, the data transmission between threads will have many performance consumption (deep copy or transfered object) ). The browser-side shared memory is implemented through SharedArrayBuffer, because there is a security hazard, most of the mainstream browsers closed SharedArrayBuffer, chrome67 + started recovery. Considering that compatibility multi-threaded support is also trying again. WEBGL rendering decoding average time has 4 ms (15%) on YUV to RGBA, can use the GPU to accelerate the calculation of the image via WebGL, but at the same time as the WebGL data exchange, a certain performance loss will need to be tested to view performance results.
Future lookout
Through H.265 video playback, the capabilities of open source view FFMPEG and the advantages of WebAssembly performance have a deep attempt on the browser end audio processing. Video as a multimedia form, compared to existing text, images, audio can have more vivid and more information. Especially after the explosion of live broadcast and short video, it became a basic multimedia form, and it is also an embodiment of technology development such as network and mobile phone performance. Future With 5G and higher performance hardware equipment development will be more widely used in various fields. The browser is also an indispensable part of this video revolution. Through this exploration, the general capability of expanding the video and audio processing in the future browser is provided with the universal capabilities of the visual audio processing, and also in the browser side via Webassembly to Native. The performance and ability is close to the road to do a landing attempt, although the performance from the test is not as good as Native, but with the evolution of standards and technologies, the performance of graphic images and artificial intelligence such as performance requirements for performance requirements The browser is handled by the browser, such as the following directions:
Extended browser-side video playback capabilities With FFMPEG powerful codec, except for H.265 video playback, future video playback of various formats and encoded types in the future. Such as different encoding formats AV1, different container format MOV formats, etc. Extended browser ends audio processing capabilities can make more complex operations such as video filters such as video filter, video shear, video format conversion, and reduce network transmission and Store costs. WEBASSEMBLY high-performance web applications allow the open source framework of traditional other languages such as graphics-related open source departments such as graphics-related open source library OpenGL, SDL, and the like to browser. A traditional image, 3D computing capacity can be extended to the browser.
refer to
Chromium Media Elements Source: https://github.com/chromium/chromium/tree/master/media webassembly: https://webassembly.org/ Excellent Open Source View FFMPEG: https://www.ffmpeg.org / LLVM-based Webassembly Packing Tool Set Emscripten: https: //eml Based on Webassembly OGG Player: https: //github.com/brion/ogv.js Based on FFMPEG: HTTPS : //github.com/leixiaohua1020/simplest_ffmpeg_player
Later words
This article describes the processes and progress we have developed on the web-side H.265 player, and there are many points that continue to optimize and in-depth. Students who are interested in related knowledge are welcome to communicate, attached to our front-end team:
Taobao Technical Department content and open platform front team is the core of the core of the Northern China, which is more in-depth in the field of content e-commerce, audio and video technology, and explores innovative business models and industry leaders relative to the front-end team of horizontal resources.
This team is currently hot in the 20th excellent graduates, welcome to join the people to join!
Or contact the team directly: Ling Yu [email protected] "