"At the livevideostackcon 2020 Beijing offline summit, we invited Li Zhitao, technical director of Beijing 263 Enterprise Communication Co., Ltd. to share. After years of building, the 263 cloud video platform has supported the use scenarios of multiple protocols and multiple terminals. He will share the technology iteration process of the 263 operation level cloud video platform guided by the video + strategy.
Hello, I'm Li Zhitao from Beijing 263 Enterprise Communication Co., Ltd., mainly responsible for the development of the company's audio and video business line.
You may be unfamiliar with 263 Network Communication Co., Ltd. in one sentence: 263 has been deeply engaged in the industry for 20 years and has been a service provider who knows enterprise Internet communication best. The company was founded in 1993, formerly known as Haicheng paging. In 1999, it became the largest dial-up access service provider in China outside the basic operator. In 2001, 263 self built computer rooms became the first batch of four-star IDC computer rooms in China. In 2004, 263 telecom value-added service providers obtained the national multi-party communication license. In 2005, the enterprise mailbox launched by 263 became the first brand of enterprise mailbox outsourcing service in China. In 2010, 263 network communication was listed in Shenzhen, stock code: 002467. In the same year, the company launched 263 teleconference system, which was also the fastest-growing teleconference service provider in China. In 2015, 263 acquired Zhanshi interactive and became the largest multimedia interactive live broadcasting technology service provider in China at that time. In 2018, 263 obtained the first batch of mobile resale licenses in China and launched enterprise customized mobile communication services. In the same year, 263 launched the video + strategy. Since then, it has made a major strategic transformation from paging, access, messaging, mailbox and teleconference to the current real-time service provider based on audio and video.
Next, I will mainly introduce 263 audio and video from four points, focusing on the iterative direction of Technology Construction Guided by video + strategy. 1 263 cloud video product introduction
After years of building, based on 263 video cloud, it supports multi protocol and multi terminal scenarios, mainly including 263 cloud terminals. Cloud terminals include a variety of hardware terminals. According to the enterprise office scene, we propose a variety of terminal solutions for individual participants, small meetings, medium-sized meetings and large meetings. In addition to meeting room level, we also support real-time audio and video conference communication from mobile terminal, PC terminal, Windows terminal, MAC terminal, IOS terminal and Android terminal at any scene, any region and any time. 263 video cloud is compatible with various audio and video communication modes in the whole protocol layer, mainly webrtc protocol, compatible with VP8 / VP9 / H.264 and other encoding and decoding modes, compatible adaptation to multiple browsers, and access support to Microsoft lync protocol. At present, a considerable number of enterprise customers have a large number of hardware terminals based on SIP and H.323 protocols. 263 video cloud also provides protocol support for these terminals, which can be accessed directly. Some customers buy high-cost hardware MCU of Cisco and Polycom, and we have opened up the whole video cloud for access. The video cloud provides integrated access to the teleconference of our traditional PSTN network. Mobile phones and landlines can access audio through teleconference platform and integrate audio with video cloud. The real-time content of the video cloud is pushed to the cloud for live broadcast or on-demand by RTMP standard protocol. 1.1 capability matrix
The whole capability matrix of 263 video cloud system mainly includes business management system, support management, user management, multi service platform management, user authentication and authority information management. 263 video cloud provides a variety of video service scenarios: conference service is to solve enterprise remote office; Educational services, such as large classes, small classes, double teachers and K12; Telemedicine services, telemedicine training, remote surgery, etc. The message system mainly includes the following types of messages: IM message, application message and message notification; Signaling transfer system, including voice signaling, audio and video signaling and dispatching signaling. The attendance system mainly solves the problem that the user locates and controls his message scheduling after logging in. Real time RTC system is our core, including webrtc service and streaming service. Webrtc service mainly accesses web services and app services for users' real-time audio and video communication, and streaming service mainly accesses users' streaming and live broadcasting services. The core system mainly manages and schedules the whole cluster, including the management of hardware availability of the whole cluster, the management of upper and lower limits of hardware servers, the management of load balancing and failure transfer, the management of parallel expansion of the system, and the room level task scheduling management according to the system load. MCU transcoding and mixed screen processing based on audio and video configuration. SIP service mainly solves the docking with SIP modules of external systems, including teleconference, third-party hardware and third-party systems based on SIP restrictions. Recording is an on-demand recording service for conference, education, telemedicine and other services. The 263 live broadcast network is mainly connected to the existing live broadcast system of 263, and also pushes streams to Alibaba cloud and Tencent cloud. The broadcast guide station performs some value-added service functions based on streaming live broadcast. Live broadcast management is used to manage live broadcast permissions, live broadcast venue control, etc. Cloud storage is associated with recordings. After the recordings are stored in cloud storage based on object storage, VOD on demand of relevant business requirements can be carried out. The public application system has multiple application services, such as questionnaire, voting, reward, etc. Teleconference system is a PSTN conference system with hardware conference bridge as the core. SIP MCU can connect with SIP MCU of external system or SIP terminal. 1.2 SaaS&PaaS
We provide SaaS & PAAS interface capabilities. The first half is mainly SaaS layer capabilities, conference and education applications, and telemedicine applications. The whole system includes message SDK, shared SDK, marked SDK, RTC SDK, on-demand SDK and live broadcast SDK. We can also provide PAAS layer development interfaces to customers with in-depth development capabilities. The whole calling mode is method and function calling, and the bottom layer communicates based on socket or RPC. 2 technical architecture
Next is the iterative steps of our entire technical architecture in recent years.
263 cloud vision technology open source foundation is based on Google's open source webrtc and Intel's open source OWT. 2.1 architecture topology v1.0
The whole technical framework is built from the first generation. The first generation system is divided into two layers according to its functions. It adopts cluster distributed deployment and basically distributes four types of IDC. The first layer IDC is our core BJ DC, and the second layer mainly solves the access problems of domestic north-south interconnection and cross operator access. We provide overseas access points for overseas users to access the system. So far, due to cost considerations, it is impossible to deploy computers and nodes in major cities across the country. Therefore, we have deployed several nodes in places where users use more, and we currently use Alibaba cloud to supplement other nodes. We mainly use Alibaba cloud's ECs and its bandwidth. At present, in terms of national users accessing 263 video cloud system, the quality of several nodes accessing Alibaba cloud is still good, which can ensure the full coverage of domestic users. This is the 1.0 system of our entire system architecture topology. The biggest problem of the 1.0 system is that the interaction between different IDC in the same region and operators needs to be exchanged at the central node, which has a high cost. In addition, the data link is long and the user delay is large. For RTC applications, it is acceptable in 400 milliseconds, while the physical distance is 2000 kilometers back and forth. This delay is considerable. Based on these problems, we developed version 2.0. 2.2 architecture topology v2.0
Based on the problem of architecture topology 1.0, we developed version 2.0, which mainly makes a layering and adds a relay pool. The user access layers of the same region, the same operator or the same operator across regions can be interconnected. If they cannot communicate with each other, they can transfer to each other through the relay pool, and the relay can be expanded in balance. The whole architecture has changed from layer 2 of 1.0 to layer 3 of 2.0. Beijing IDC and relay IDC are deployed in multi line BGP machine rooms. For overseas, relay layer nodes are added based on the United States, Germany, Europe and Hong Kong. The user interacts based on the local, and goes directly from the local. There is no need to transfer data to the core computer room. The user will feel very good about the delay. At the same time, the core computer rooms in Beijing have been transformed into high availability. When one of the core computer rooms is attacked, hot standby can be quickly switched to other computer rooms. 2.3 media signaling logic
The media signaling logic in the system mainly embodies three layers: the background core layer, the middle multi-line access machine room relay layer, and the access layer accessed by users nearby. The core logic OWT core, a secondary development based on Intel's OWT, is mainly responsible for the calculation, control and scheduling of the whole system. SRC solves the problem of intelligent routing. Users will access all over the world. SRC is responsible for finding the access with the best network quality. MC system is responsible for the access screening of server availability and allocating servers with low load and running homogeneous services to users. 3 media communication mode
3.1 basic mode
At present, the technical support of all aspects of 263 cloud video has been able to solve some access quality problems. However, if some business scenarios and business models mistakenly use the basic communication mode, it will lead to increased traffic, limited bandwidth, excessive occupation of server computing resources, and great impact of network jitter, packet loss and delay on the amount of real-time audio and video. The figure above mainly describes several communication modes used by webrtc so far. In the first mesh mode, the two points of webrtc are directly connected, and the media communicate through P2P. The second SFU has some commonalities with mesh. The difference is that it forwards all media through the server. For each client, the upstream of the stream is 1 channel and the downstream is n-1 channel. The disadvantages of the two are basically the same. The advantage is that the stream is relayed through the server, which is convenient for mixed stream push live broadcast or its recording operation. The third mode is based on MCU. The advantage of MCU is one uplink and one downlink. The client consumes less bandwidth. Because this mode requires mixer streaming of audio and video, it consumes the computing resources of server CPU and GPU. Each of the above communication modes has its own advantages and disadvantages. Later, we will talk about the combination of basic communication modes and hybrid communication modes based on different service scenarios. 3.2 version 1.0
The above figure is the data flow chart used in 1.0. The client publishes and subscribes audio and video according to the business logic. On the right is the server. In 1.0, we support MCU and SFU at the same time. I give an example of four access modules. The dotted line represents user layer access based on UDP protocol. If the SFU mode is used, the computing power of the server background is not involved. Using different encoding and decoding methods on the client will also use the transcoding ability of the MCU module in the background of the server. After transcoding, it will be distributed to the client. The business scenario using MCU will mix audio and video to the MCU module. In this version, SFU lacks the blessing of SVC or simulcast, so the audio and video quality is not guaranteed. 3.3 version 1.5
In the middle, we launched version 1.5, which mainly uses the server MCU to close the screen and the client screen cutting mode to realize the flexibility of user use scenarios. After cutting, each path can be displayed separately and laid out flexibly. 3.4 version 2.0
Version 2.0 evolved from version 1.0. The original SFU added simulcast function and used MCU at the same time. We have expanded some functions of MCU, including RTMP streaming. RTMP streaming can be pushed according to user-defined coding format and user-defined layout. It can connect with SIP gateway, integrate with hardware MCU system or PSTN teleconference through SIP gateway. 3.5hybrid mode
As mentioned earlier, our whole system is based on SFU and MCU. The advantages of SFU are flexible distribution, high concurrency and high real-time performance. The disadvantages are that there are many downlink forwarding channels, high bandwidth occupation, affecting the experience, and the cost of maintaining multiple connections at the client is high. The advantage of MCU is that it occupies less downlink bandwidth. The disadvantage is that the server performance requirements are high, the deployment cost is too high, and the server link is added, with slightly poor real-time performance. The mixed mode we adopt is based on MCU + SFU, and the business scenario determines whether to use MCU or SFU. If it is less than five parties, the advantages of SFU are appropriate and the cost is low. If it is more than 6 parties and the added value of the customer is higher, MCU computing resources shall be used. The interaction mode determines the communication mode. 4. Operation level technical superposition
We just talked about webrtc and OWT. In practice, we optimize the NACK and FEC functions of audio and video transmission according to the weak network quality problems we encounter. Solve the problem that audio and video lip sounds are not synchronized. Through the stream cutting function, we can solve the problem that the TV terminal, computer terminal and mobile terminal of users want to receive different resolutions under the MCU mode. The large screen is 1080p, the PC terminal is 720p, and the mobile terminal may be 360p. At the same time, it also solves the problem that different users obtain different bitstream data according to their own network quality. When the system crosses IDC, the internal network of the cluster will flash off occasionally, and some exceptions will occur. A fault-tolerant mechanism is added to ensure the robustness of the system. In terms of database, mongodb cluster is used, and the rabbitmq message bus uses haproxy + 3rmq high availability. The above sharing is a transformation of this system suitable for operation level.
Responsible editor: LQ, read the full text, original title: Construction of b-end operation level video service technology platform
The source of the article: [micro signal: livevideostack, WeChat official account: microwave radio Forum] welcome to add attention! Please indicate the source of the article“
Our other product: