Abstract: after several generations of development, Internet video has rapidly upgraded the user experience, and has higher and higher requirements for innovative playing methods and interactivity. How to solve these problems? Media AI is the consensus of the industry. Huawei cloud provides cloud native real-time media AI capabilities. It cooperates with partners to create AI algorithms, open the ecological market and accelerate video business innovation. Cloud effects are one of the use cases.
With the continuous upgrading of video business, users have higher and higher requirements for experience, such as stronger interaction, more different playing methods, and more cool experience. In addition, the live broadcast service and RTC service have serious content homogenization, and the content creation and user experience have also encountered the ceiling. These all need some technologies to break it. After a long-term collision between Huawei cloud and Betta, we got a proposition: do this through media AI. There are many live effects in the current live video, including beauty, beauty, virtual anchor, background replacement, etc. at the same time, Huawei cloud will also have some abilities like classroom evaluation for online education scenes, which are all based on media AI.
Three pain points in building media AI
At present, there are several major pain points in the construction of media AI:
1. Terminal: many types and weak computing power.
At present, many media AI capabilities are implemented on the terminal. Although the computing power of the terminal is constantly improving, it is impossible to do some high complexity special effects, such as virtual characters. The current effect of background replacement is also very general. You can see an obvious outline, not to mention the background replacement of benchmarking film level special effects.
2. Cloud: weak real-time interaction and high cost.
At present, many video AI in the cloud generally prefer offline business, and its satisfaction with interactive experience is relatively low. In the era of live broadcast and real-time interaction, this can not meet the demand. Moreover, because the audio and video data are transmitted between the edge and the cloud source station, a higher bandwidth cost is introduced.
3. High innovation threshold and ecological closure.
At present, the AI capability is developed independently by various manufacturers, each playing its own game, and the ecology is relatively closed.
Huawei cloud hopes to build a side cloud collaborative real-time media AI with partners based on cloud native, so as to solve the above pain points and accelerate video business innovation.
Definition and core value of real-time media AI
Huawei cloud defines real-time media AI. Based on Huawei cloud’s native edge, computing, container, storage, network and other service capabilities, Huawei cloud builds rich real-time media AI processing capabilities. At the same time, it works with partners to create an ecologically open AI algorithm market, accelerate video business innovation, provide differentiated competitiveness for customers and provide users with better experience.
Let’s take a simple example. We implement some special effects on the inner side of the current live broadcast and RTC through the edge cloud, and add more and more cool special effects, such as better background replacement, AR cartoons and virtual characters to improve the interactivity, etc.
At present, real-time media AI is still in a state of exploration and promotion. We hope it can realize the following four core values:
1． Play more.In the future, live broadcast and RTC services have more AI capabilities that can be used and combined to achieve more and more cool innovative games;
2． Better experience.With these innovative playing methods, users have high requirements for the experience of real-time interaction. We hope to realize the experience of “delay without feeling”, which is the same as the local use of high-end machines. Another point for algorithm developers is that they can publish faster, develop faster, and experience verification faster on the cloud based platform.
3． Lower cost.At present, most of the capabilities of media AI are implemented based on the end side, because when you mention the cloud side, you will think that the cost of GPU and various hardware platforms will be relatively high, but we hope to reduce the cost. There are two parts here. One is to use the unified software and hardware resources of Huawei cloud to improve the reuse rate of resources and reduce the cost of each channel of resources. The other is to use the unified platform on the cloud to update with one click without adapting multiple terminals to reduce the cost of algorithm development and app update.
Ecological openness.We hope to build an open AI algorithm ecological market to avoid the state of making cars behind closed doors and playing each other. We hope to reduce the threshold of algorithm development through the interoperability and sharing of AI algorithms.
The realization of these core values is based on cloud native. The following will introduce in detail how to realize the core values through cloud native from the aspects of cloud native architecture, real-time processing framework and algorithm openness of real-time media AI.
Real time media AI cloud native architecture
First, let’s introduce the cloud native architecture of real-time media AI.
From the bottom up, firstly, it is based on the edge node of Huawei cloud. Huawei cloud IEF edge management service is used to manage and schedule the software and hardware resources of the whole node. Secondly, the framework is based on Huawei cloud EI platform, providing a variety of capabilities such as modelarts training, and supports mainstream in-depth learning frameworks such as tensorflow and pytorch. The framework also includes a SWR (image warehouse) service, which can integrate and publish Huawei EI self-developed algorithm images or third-party EI based algorithm images. Another key point to introduce is that we provide a high-performance edge function computing capability. This capability is actually a function level processing framework for real-time media AI services. It can schedule all AI algorithms to edge nodes in real time with very high performance and arrange edge functions. The real-time media AI platform is used to unload the media processing capability on the terminal, reduce the development amount of media processing adapting to different terminal platforms, ensure the consistent experience effect of different terminal platforms, and provide normalized media processing capability for different solutions.
Through this architecture, we provide more and better media AI playing methods. At the same time, its innovative development platform of unified algorithm can avoid multi terminal adaptation.
Real time processing function framework
As mentioned earlier, the core of real-time media AI is real-time. How to minimize the end-to-end processing delay and make users “insensitive” to the delay.
In addition to sinking the real-time processing capability to the edge, providing processing nearby and reducing link delay, real-time media AI also focuses on building a real-time processing function framework to reduce processing delay. The processing framework mainly reduces the end-to-end processing delay in the following aspects: 1) based on Huawei cloud shengteng, Kunpeng and other hardware, AI algorithm and video codec algorithm are accelerated; 2) Accelerate the transmission of video raw data between AI algorithm containers through high-speed bus; 3) The AI algorithm is loaded in advance through the function resource pool preheating mechanism to reduce the startup delay. In the whole process, we hope to achieve within 100 milliseconds, plus the network delay, within 300 milliseconds. In this way, the delay is imperceptible to the user.
Cloud native algorithm open
As mentioned earlier, Huawei cloud is not only building a real-time media AI service capability, but also hopes to build a real-time media AI algorithm ecology for a variety of business scenarios. It hopes that more partners will participate and promote business innovation and user experience together.
Based on this idea, Huawei cloud has built an algorithm open process, including the standard algorithm interface and integration process of all real-time media AI algorithms, as well as the construction of algorithm ecological market, which will be launched one after another. In this way, no matter the AI algorithm built based on Huawei EI platform or considering the privacy of data, the AI algorithm built based on its own platform can be integrated into our real-time media AI through this process.
Real time media AI application case – betta
Based on Huawei cloud’s native RTC real-time audio and video services and real-time media AI capabilities, Betta realizes real-time cloud special effects, transfers the special effects that are difficult to achieve on the end side to the cloud side, provides users with a “delayed and senseless” real-time interactive experience, promotes business innovation and improves business stickiness. Huawei cloud’s leading cloud native technology enables Betta to focus on cloud side innovation, avoid adapting to a variety of terminals, quickly verify the innovation effect and greatly improve the R & D efficiency. At the same time, it can also avoid multiple downloads by users caused by frequent SDK updates and improve the user experience.
Based on the algorithms of Huawei cloud and Betta, the two sides have also built an algorithm ecology full of imagination. Facing the future, the two sides will continue to deepen cooperation, provide more innovative playing methods based on AR / VR, and combine Huawei cloud algorithm mall to bring more scene choices for anchors and pursue better user experience.
At present, we are working with Betta to make real-time cloud effects such as beauty, beauty, filters and stickers. These effects will be integrated into the live broadcast platform of Betta, and the subsequent virtual image, background replacement and other effects will be launched one after another.
Low latency cloud beauty, background replacement demo
The first demo is a low delay cloud beauty effect currently done with Betta, which includes a series of treatments such as whitening and skin grinding. This end-to-end delay can reach 150 milliseconds under the condition of good network. As we can see, the two contrast videos are basically synchronized, and there is no difference with the naked eye.
Another demo is background replacement. It’s a background replacement in a knowledge video. Later, the background replacement effect will be better, which can be like film level. For example, the combination and superposition of background replacement, beauty and virtual image will make a better and more dazzling effect.
Huawei cloud hopes that real-time media AI, as a key means to accelerate video business innovation, can provide more processing power, better interactive experience and lower innovation cost. Of course, it also hopes that more AI algorithm partners will join in to jointly create an open ecology!
This article is shared from Huawei cloud community “real time media AI, breaking the ceiling of content creation and accelerating video innovation”, original author: audio and video housekeeper.
Click focus to learn about Huawei cloud’s new technologies for the first time~