In the cloud era, online audio and video services such as live video and real-time audio and video communication are facing various complex network environments and explosive growth of traffic, which pose new severe challenges to audio and video quality monitoring and cost optimization. In this sharing, we invited Kang Yonghong, the R & D director of Huawei cloud audio and video big data. He introduced in detail the whole process quality monitoring and evaluation system of audio and video based on big data and the Optimization Practice of each link, as well as how to improve the experience quality and product cost of audio and video services through different key data indicators for different businesses and scenarios.
By Kang Yonghong
Finishing / livevideostack
Hello, everyone. It’s a great honor to share this opportunity. First of all, thank you livevideostack. I’m Kang Yonghong, from Huawei Company. I have more than ten years of research and development experience in big data and audio and video business. I’m responsible for the management of QoS, QoE and QoS of live broadcast, video conference, RTC and VR, mainly focusing on the business field of improving the experience and optimizing the cost of audio and video products based on big data. I personally think this is also a difficulty in the whole audio and video field.
2020 is an extraordinary year. Objectively speaking, it has promoted the explosive growth of our audio and video business. Huawei cloud is based on a high-capacity, low latency, fully interconnected media network. With more than 2000 nodes and hundreds of T broadband, Huawei cloud has served hundreds of millions of online users with our customers. In this process, it is particularly important to solve video experience quality and cost optimization through big data. At the same time, we have accumulated some experience. Today, I will share with you the practice of Huawei cloud in audio and video quality monitoring and Optimization in the cloud native era.
This sharing is mainly divided into four parts: the first part is why we need to build an audio and video data service system in the cloud native era; the second part is the practical cases of Huawei’s cloud video live broadcasting and RTC audio and video live broadcasting in terms of experience quality; the third part will introduce how Huawei quickly builds an audio and video service whole process quality monitoring platform in the cloud native era, Finally, it will summarize and prospect the thinking and technical planning of audio and video business experience quality.
1 “construct audio and video data service system
From the perspective of the development trend of audio and video experience, it can be divided into three generations: live broadcast, RTC and XR, which can be summarized into two characteristics: the first point is that the user experience is more and more real, and the transmission resolution is from 720p to 1080p, and then to the 4K, 6K, 8K and even larger of XR; The other point is that the business requires more and more interactivity, and the delay from 30s to XRS is less than 100ms, which requires lower delay.
Based on the above development trend of quality experience, we need a backstage technical support. We have also experienced three stages in the process of using big data to solve support: first, we used big data platform to solve technical problems five years ago, second, we used data middle platform to solve efficiency problems three years ago, and third, we used the data service middle platform mode of “middle platform + trusted data service” to solve value problems in the last two years based on the characteristics of 1.0 and 2.0 era. We believe that the data service center is the best framework to solve business differences and market uncertainty.
In the live broadcast, we often encounter the situation of stuck, real-time audio and video call delay, which will seriously affect the user experience. The general way to solve these problems is to build an audio and video quality monitoring platform, collect data, and use big data to solve the monitoring quality problems. In this process, we will encounter some new problems, such as big data collection delay, a lot of loss, inaccurate data, in addition to big data computing power is not enough, delivery delay is relatively long and so on. These experience and technical problems bring us a lot of challenges, including the scene in which these problems will appear, the inability to accurately determine whether it is a network problem, a device problem or an environmental problem, and which customers are affected by these problems, etc.
How can we solve these problems? In the era of end + edge computing + cloud computing, technology has given us some solutions. The best practice is to solve the business differences and market uncertainty based on the cloud native data driving ability of “data Lake + data service”. This architecture is divided into six layers, through which we can solve the contradiction between the relative steady state of the background system and the steady state of the front-end business.
The specific practice of architecture implementation is based on cloud service infrastructure. First of all, we build a unified audio and video data lake, and build a data value chain from collection, production to consumption. Through the combination of the two, we support access to all kinds of services online at the same time, as well as QoS, QoE, QoS for seven types of internal and external customers, as well as customer data services including operation and maintenance QOC and other three categories and seven sub categories of scene appeal. Of course, it is not enough to carry out experience quality optimization based on this architecture, it is only a technical solution.
From the business point of view, we think that QoE experience is a management problem, and we need to do some design in business. We have built an audio and video service experience system, which is roughly divided into two major stages and three small stages. The two major stages are diagnosis first and then promotion. The diagnosis is divided into two small stages: monitoring and diagnosis.
Specifically, first of all, we need to build a three-dimensional real-time monitoring system of QoE and QoS, as well as an anomaly detection method to assist AI, so as to discover problems in real time. The second step is to use the second level diagnostic ability to quickly diagnose the cause of the problem, which can be specific to the user behavior level. Based on the diagnosis results, the third step is to improve the experience. Generally, there are two methods: the first is to rely on artificial experience for optimization, and the other is intelligent scheduling. We can achieve the optimal user experience under the condition of controllable cost based on Intelligent Scheduling Strategy in different industries and different scenarios.
Based on the above experience quality optimization system, next I will share the experience optimization practice cases of Huawei cloud in video live broadcast and RTC real-time audio and video.
2 “Huawei cloud video live experience Optimization Practice”
Let’s first take a look at the Huawei cloud video live experience optimization practice case. We have achieved low delay, no jamming, high definition, and controllable cost. It is divided into three stages: the first stage of quality monitoring, the second stage of problem diagnosis, and the third stage of experience improvement.
For the quality monitoring of live video, we first build a three-dimensional quality monitoring system covering six dimensions of flow quality, experience, scale, network, cost and equipment, covering more than 30 indicators of QoE, QoS and QOC, including core QoS indicators such as frame rate and code rate, QoE experience indicators such as second open rate and Caton rate, and cost related indicators such as bandwidth and source return rate.
The second stage is problem diagnosis – second level quality diagnosis of live video, which is a full link monitoring system based on network data + end data. The diagnosis process runs through the first kilometer host monitoring QoS indicators such as streaming frame rate, to the network node frame rate, code rate and other QoS quality indicators monitoring, bandwidth return rate, cost indicators, as well as the last kilometer audience end stuck, second on, black screen and other QoS indicators. In this way, end-to-end real-time second level monitoring can be realized. If abnormal conditions are found, they can be fed back to customers and scheduling system in time. For example, if we find abnormal frame rate and code rate in the first kilometer, we will inform customers to make policy adjustment at the anchor. If abnormal conditions are found in the real network, we will do some node user data mobilization or other policy optimization, When the audience experience is abnormal, the intelligent scheduling system will adjust the scheduling strategy. The whole link monitoring system above covers 12 channels of live broadcast, full scene and full protocol monitoring.
Live streaming full link monitoring system – from the first kilometer, to the media network, and then to the last kilometer, the whole system is visual, which can improve the efficiency of problem diagnosis.
The third step is to improve the experience, which can be roughly divided into two methods. One is to implement through the experience of operation and maintenance students, the other is to optimize the intelligent scheduling experience based on the cooperation of end, edge and cloud data. This method uses the unified video data Lake Technology to realize the cooperation of QoS, QoE and QOC data of end, edge and cloud, and generates flows, customers, network links, nodes and network services through intelligent analysis engine The real-time portrait of the audience, based on the strategy of real-time portrait + scheduling, is implemented by the intelligent scheduling system to achieve the best experience in the case of controllable cost. Two types of indicators are selected as the measurement indicators. One is the cost indicator. For example, the return rate is used to measure whether the cost has declined; The other is the experience index, which judges whether the user experience has been improved by the rate of stuck and second open. The above are some practical cases of video live broadcasting in quality monitoring and experience improvement.
3 “real time audio and video RTC experience Optimization Practice
Next, we will share the practical cases of experience Optimization of real-time audio and video RTC. RTC belongs to the second generation of audio and video services. There are many differences between RTC and the first generation of live broadcasting in terms of services. We pay more attention to delay and behavior level monitoring. Based on these differences, we also adopt three different optimization systems.
The first point is quality monitoring. RTC quality monitoring system has established a three-dimensional quality monitoring system covering six dimensions of communication, network, cost and equipment, covering more than 30 indicators of QoE, QoS and QOC. Among them, the core indicators include equal bit rate, frame rate, packet loss rate, jitter quality QoS, user experience QoE indicators such as second open rate, delay, stuck rate, room entering and viewing success rate, and cost QOC indicators such as bandwidth. Compared with live broadcast monitoring indicators, especially end-to-end delay indicators, this is focused on based on the differences mentioned above.
Based on the monitoring system, the second task is problem diagnosis. We first establish three types of experience quality data services. The first type is monitoring index data service, which mainly covers server, client, device, QoE, QoS and QOC. These data are used in the statistical database and time series database. The second type is the event data service of all control and media planes in the network. The third type is terminal event data service, which includes user behavior events on the terminal side, such as adding rooms, switching roles, operating microphones or cameras, etc. in addition, it also includes terminal device data, such as CPU, memory, cameras, etc.
Based on these three kinds of experience quality data services, RTC constructs a three-tier problem diagnosis system.
The first layer is a real-time QoE / QoS monitoring system covering all links and dimensions, which can complete experience diagnosis and rapid recovery at the minute level. In the case above, red 1 is the alarm of success rate at 11 o’clock. We found that the app of a customer dropped sharply through the dimension decline. Finally, we found that the service of a SFU node in Tianjin was abnormal through the node dimension. The whole failure recovery time can be completed in minutes.
The second layer of monitoring system is based on the network behavior data and end-to-end behavior data, which can help us quickly solve the RTC service single user experience problems and complaints. In the case above, the room information and user behavior event information are used to quickly determine that the user can’t hear the sound because of the mute operation, and the whole process is completed at the minute level.
The third layer of problem diagnosis ability is an advanced ability of experience problem automatic diagnosis based on the first layer of QoE / QoS global index monitoring and the second layer of QoS behavior investigation ability. It generates more than 20 abnormal events by monitoring more than 30 indexes, and gives six kinds of experience scenarios through learning model. In this way, the system can quickly and automatically determine the cause of abnormal experience, and quickly pass it to customers.
4 “audio and video service whole process quality monitoring platform
The above is a practical case of Huawei’s cloud RTC business experience optimization. The platform is needed to optimize the experience quality. Let’s share how Huawei builds the whole process quality monitoring platform of audio and video services. First of all, from data acquisition, transmission, computing to consumption, the 100 million scale audio and video quality monitoring big data platform includes data network supporting end, side and cloud data acquisition and transmission, multi-mode data processing system supporting real-time computing, offline computing and machine learning, and data consumption service system supporting operation and maintenance, operation and customers.
When building a platform, we will encounter many performance, quality, efficiency and real-time problems. How to build a platform with large capacity, low cost, high efficiency and reliable data quality? We adopt the architecture of batch flow integration and memory calculation separation. Batch flow integration solves the problem of development efficiency. We can calculate the same index once in batch flow integration to use all services without repeated development. At the same time, we have a one-stop data development platform to improve development efficiency. In terms of cost, we use the separation of storage and calculation – the separation of storage and calculation. The storage uses the object storage, and the price is relatively low. The calculation engine uses the batch flow integration method described above, which can achieve the best cost. In terms of quality, “ods-dwd-dws-ads” four tier data management platform is adopted to ensure that all data can be tracked and managed, and any index data is real-time, complete and accurate.
With a large capacity and low-cost platform, we are also faced with network outage, equipment failure and other problems. We implement the platform availability based on cloud services, and adopt cross region disaster recovery and multi AZ mode. The overall SLA can reach 99.99%. All six types of data from the end, edge and cloud are not lost, and six types of services such as monitoring and scheduling are not degraded. In this way, we can work normally in terms of quality and service improvement if any link is abnormal in the whole environment.
5 “summary and Prospect
Looking back on this sharing, the development of audio and video experience has three characteristics: first, users’ requirements for experience are more and more realistic, and users such as live broadcast and RTC have higher requirements; Second, more and more interaction is required in user experience; Third, the environment of various network and terminal services is becoming more and more complex.
In order to ensure the quality of audio and video experience, we have three sharp tools: first, according to different business scenarios, we build an experience quality system of “monitoring first, then diagnosing, and then improving”; Second, based on “data Lake + data service”, we can solve the problem of user diversity and market uncertainty; Third, we should balance the relationship between cost and experience in the process of implementation.
For the future audio and video business experience, we have three planning directions: one is to continuously drive the optimization of QoE, QoS and QOC based on the collaboration of end, edge and cloud data; Second, build an intelligent evaluation system of audio and video content quality; The third is to establish the third generation of XR audio and video experience quality standards, such as immersion.
The above is the whole content of this sharing, thank you.