Introduction:This article is from the third quarter and tenth issue of livevideostack online sharing, which is brought by Chen Lujun, a wireless development expert of Alibaba Xianyu business division. Aiming at the large-scale practice of Xianyu app in the current popular cross platform framework flutter, it introduces some difficulties and solutions in the field of audio and video.
Hello everyone, I’m Chen Lujun from Alibaba’s leisure fish division. The theme of this sharing is the exploration of audio and video research and development under the wave of flutter. The main content is to introduce some difficulties and solutions in the field of audio and video in view of the large-scale practice of leisure fish app in the current popular cross platform framework flutter.
The sharing content is mainly divided into four aspects. Firstly, there will be a brief introduction to flutter and the reasons for choosing flutter as a cross platform framework. Secondly, it will introduce the concept of external texture in flutter, which has a great relationship with audio and video, and some optimization of it. After that, we will put forward some solutions to some problems of flutter in the process of audio and video practice — TPM audio and video framework. The last part is the introduction of the open source component of flutter.
Flutter is a cross platform framework. In the past, the audio, video and network modules were sunk into the C + + layer or arm layer, and encapsulated into an audio and video SDK for PC, IOS and Android of UI layer.
As a cross platform framework of UI layer, flutter realizes a cross platform development in UI layer. It can be expected that without the development of flutter, it will gradually become a full link cross platform development from the bottom layer to the UI layer, with technicians responsible for the development of SDK and UI layer respectively.
There are many cross platform UI solutions before flutter, so why choose flutter? We focus on performance and cross platform capabilities.
Previous cross platform solutions, such as weex, reactnative, Cordova and so on, can not meet the performance requirements because of the architecture, especially in the scene of audio and video, which is almost demanding.
For example, xamarin has the same performance as native app, but most of the logic needs to be implemented on different platforms.
We can see why flutter can achieve high performance:
Native component rendering take IOS as an example. Apple’s UIKit implements UI rendering by calling the platform’s own rendering framework, quaztcore. Graphics rendering also calls the underlying API, such as OpenGL, metal, etc.
The logic of flutter is consistent with that of native API, and the UI layer is realized by calling the underlying rendering framework layer skia. This is equivalent to flutter’s own implementation of a set of UI framework, which provides a cross platform possibility to surpass the native API.
But we say that the ultimate performance of a framework depends on the designers and developers. As for what is the current situation:
In the practice of idle fish, we found that in the normal development without deliberately optimizing the UI code, the fluency of the flutter interface is better than that of the native interface on some low-end computers.
Although there will be some situations such as Caton flashback in some scenes, this is an inevitable problem in the development process of new things. We believe that the future performance will not become the bottleneck of the development of flutter.
In the process of flutter practice, hybrid stack and audio and video are two difficult problems to solve. Hybrid stack means that an app can not rewrite all services as flutter in the process of flutter, so it is a step-by-step iterative process. During this period, the coexistence of native interface and flutter interface is called hybrid stack. Idle fish also has some good outputs on the hybrid stack, such as flutterboost.
Before talking about audio and video, we need to briefly introduce the concept of external texture, which we call the bridge between flutter and frame.
The first thing that flutter should do to render a frame of screen data is that the VC signal sent by GPU is in the UI thread of flutter, and the machine code compiled by AOT is combined with the current dart runtime to generate the layer tree UI tree. Each leaf node on the layer tree represents every element that needs to be rendered on the current screen, including the content that these elements need to render. Throw the layer tree to the GPU thread, and call skia in the GPU thread to complete the whole UI rendering process.
There are two important nodes in layer tree, picturelayer and texturelayer. Picturelayer is mainly responsible for the rendering of screen pictures. A set of picture decoding logic is implemented in flutter. After the IO thread reads or pulls the picture from the network, the texture can be loaded on the IO thread by decoding, and then the GPU thread will render the picture to the screen.
However, there are too many system APIs in audio and video scenarios, and the business scenarios are too complex. Flutter does not have a set of logic to implement cross platform audio and video components, so flutter proposes a way for third-party developers to implement audio and video components, and the video rendering outlet of these audio and video components is texturelayer.
In the whole process of layer tree rendering, the data texture of texturelayer needs to be specified by an external third-party developer. The video data and player data can be sent to texturelayer and rendered by flutter.
Texturelayer rendering process: first determine whether the layer has been initialized, if not, create a texture, and then attach the texture to a surface texture.
This sufacetexture is an object that can be obtained by the native code of audio and video. Through the suface created by this object, we can decode the video data and camera data into suface, and then the flutter side can smoothly update the newly created data into its texture by listening to the data update of sufacetexture, and then give the texture to skii Render to the screen.
However, if we need to use flutter to achieve beauty, filter, face mapping and other functions, we need to read out the video data, update it to the texture, and then process the GPU texture through the beauty filter to generate a processed texture. According to the existing capabilities provided by flutter, the texture data must be read out from GPU to CPU, and then written into surface after generating bitmap, so that the video data can be updated smoothly in flutter, which consumes a lot of system performance.
Through the analysis of the rendering process of flutter, we know that the underlying data to be rendered in flutter is GPU texture, and the result of our beauty filter processing is also GPU texture. If we can directly render it to flutter, we can avoid the useless cycle of GPU > CPU > GPU. This method is feasible, but one condition is OpenGL context sharing.
Before we talk about context, we have to mention a concept closely related to the thread: thread.
After the flutter engine starts, it will start four threads:
The first thread is the UI thread, which is defined by flutter. It is mainly responsible for creating layer tree with the machine code compiled by the current dart and the current running environment when the GPU sends the Vsync signal.
There are IO threads and GPU threads. As in most OpenGL processing solutions, flutter also adopts the idea that one thread is responsible for resource loading, and one part is responsible for resource rendering.
There are two ways to share textures between two threads. One is eglimage (IOS is cvopengl estexturecache). One is OpenGL share context. Flutter realizes texture sharing by sharing context. It shares the context of IO thread and GPU thread under the same share group. In this way, the resources under the two threads can be seen and shared.
The platform thread is the main thread. There is a strange setting in the flutter that the GPU thread and the main thread share the same context. And there are many OpenGL operations in the main thread.
Such a design will bring a lot of problems to audio and video development, which will be described in detail later.
The condition that the OpenGL texture processed by audio and video terminal beauty can be used directly by flutter is that the context of flutter needs to be under a share group with the OpenGL context related to platform audio and video.
Because the context of the main thread of the flutter is the context of the GPU, if there are some OpenGL operations in the main thread of the audio and video terminal, the whole OpenGL of the flutter may be destroyed. Therefore, all OpenGL operations need to be limited to child threads.
Through the above two conditions, we can achieve beauty and filter functions without increasing GPU consumption.
After the demo verification, we applied this scheme to the audio and video components of leisure fish, but some problems were found in the transformation process.
The above image is a code for converting the camera data into texture. There are two operations: first, the process of cutting, and then the OpenGL operations are cut into the cameraqueue. Then set the context once. Then this kind of restriction or hidden rule is often ignored in the development process. And once this condition is ignored, the consequence is that there are some inexplicable strange problems, which are extremely difficult to investigate. Therefore, we hope to abstract a framework to realize thread switching, context and module life cycle management by the framework itself. After accessing the framework, developers only need to implement their own algorithms with ease, and do not need to care about these hidden rules and other repeated logical operations.
Before the introduction of flutter, the audio and video architecture of Xianyu was the same as most of the audio and video logic
1: The bottom layer is some independent modules
2: The SDK layer encapsulates the underlying modules
3: The top layer is the UI layer.
After the introduction of flutter, through the analysis of the use scenarios of each module, we can draw a hypothesis or abstraction: audio and video applications on the terminal can be summarized as the process of video data frames flowing between each module after video frame decoding. Based on this assumption, we do the abstraction of flutter audio and video framework.
Open source multimedia components of Xianyu flutter
The whole framework is divided into four parts: pipeline and data abstraction, module abstraction, unified thread management and context management.
Pipeline, in fact, is the pipeline of video frame flow. The data involved in audio and video include texture, bit map and time stamp. Combined with the existing application scenarios, we define the pipeline flow data as the main data, and can selectively add bit map and other auxiliary data. This way of data definition can avoid the performance overhead caused by repeated creation and destruction of texture and some problems caused by multi thread access to texture. It also meets the needs of some special modules for special data. At the same time, texture pool is designed to manage the texture data in pipeline.
Module: if the pipeline and data are compared to blood vessels and blood, then the frame audio and video scene can be compared to organs. We abstract three basic classes of acquisition, processing and output according to the location of the pipeline where the module is located. These three base classes implement the common logic of thread switching, context switching, format conversion and so on. By integrating each function module from these base classes, a lot of repetitive work can be avoided.
Thread: when each module is initialized, the initialization function will go to the thread management module to get its own thread. The thread management module can decide to assign new threads to the initialization function or threads that have been assigned to other modules.
There are three advantages
- You can decide how many modules a thread can mount according to your needs, so as to achieve load balancing between threads.
- Multithreading parallel mode can ensure that the OpenGL operation in the module is in the current thread instead of running to the main thread, so as to completely avoid the destruction of the OpenGL environment of flutter.
- Multithreading parallel can make full use of CPU multi-core architecture to improve processing speed.
Modify the flutter engine from the flutter side, take out the context, and create a unified context management module according to the context. Each module will get its thread during initialization, and then call the context management module to get its own context. This can ensure that the context of each module is shared with the context of the flutter, the resources between each module are shared and visible, and the flutter and audio and video native are shared and visible with each other.
Based on the above framework, if you want to realize a simple scene, such as real-time preview and filter processing function,
1. Functional modules need to be selected, including camera module, filter processing module and flutter image rendering module;
2. Need to configure module parameters, such as acquisition resolution, filter parameters and front and rear camera settings;
3. After creating the video pipeline, the module is created with the configured parameters;
4. Finally, the pipeline is equipped with a module, and the simple function can be realized by opening the pipeline;
The figure above shows the code and structure of the whole function.
Combined with the above audio and video framework, Xianyu implements the flutter multimedia open source component.
The group should contain four basic components, namely:
1. Video image capture module
2. Player components
3. Video image editing component
4. Album selection component
Now these components are going through the internal open source process. It is expected that the album and player will be open source in September.
Future outlook and planning
1. Realize the cross end development of the whole link from the underlying SDK to UI mentioned at the beginning. At present, the underlying framework layer and module layer are implemented by different platforms. Instead, the UI side of flutter is unified across platforms. Therefore, the logic of the underlying layer will be sunk to the C + + layer according to the common practice of audio and video, so as to realize the full link cross platform as much as possible.
2. The second part is open source co construction. The open source content of Xianyu not only includes shooting and editing components, but also includes many underlying modules. It is hoped that some developers can make full use of the audio and video module ability of Xianyu to build app when opening voice and video application based on flutter Framework, developers as long as to be responsible for the implementation of special needs module can, as far as possible to reduce duplication of labor.
Author: Chen Lujun
This article is the original content of yunqi community, and cannot be reproduced without permission.