As a developer, we need to have a server to support the Internet of the new video industry. Which open source solution can support the new business? What key capabilities or requirements does the program need to support? This article is compiled from the content shared on livevideostack by Yang Jianli, head of alicloud RTC server team.
By Yang Jianli
Finishing / livevideostack
Hello, I’m Yang Jianli from Alibaba cloud. This sharing will introduce the key technologies and future development of open source streaming media server in detail.
I started to work on ffmpeg streaming media in 2009, started to participate in the development of streaming media server in 2012, and started to work as an open source streaming media server SRS in 2013. It has been more than seven years now. In these short years, with the outbreak of live video, SRS also ushered in rapid growth. After I came to Alibaba cloud in 2017, I changed my direction to webrtc. We can see that live webrtc is widely used in the whole video industry, including online office, online education, online entertainment and other industries. Audio and video has become an indispensable means of Internet communication and information dissemination.
This sharing will mainly focus on the birth and history of SRS, the next development plan of SRS, etc., and lead you to deeply study the value and significance of SRS.
Thanks to the improvement of China’s communication infrastructure, especially the sinking popularity of Wi Fi and 4G networks, audio and video products and services in China’s Internet market experienced explosive growth from 2015 to 2018. At that time, consumers generally had a bandwidth of about 1m, and the network environment was relatively stable.
The technology behind the live broadcast has been mature in the era of functional computers. For example, from 2010 to 2012, consumers mainly watched live webcast video through flash on PC, because flash can cross the mainstream PC browser. Although the mobile terminal also supports flash, the effect is not good. Mobile terminals, such as Android or IOS, mainly support HLS. In the early stage, the support effect of Android for HLS was not good, but there was a significant improvement later.
No matter in the traditional PC era or the mobile Internet era, the main protocols used in streaming media are RTMP / flv and Apple’s HLS, and the streaming media players mainly include red5, nginx-rtmp, crtmp, wowza, AMS, etc. Since about 2017, the flash player in Chrome has been disabled by default, and flash will gradually withdraw from the Internet history.
With the development of the Internet, mobile live broadcasting has gradually emerged and become the mainstream. For example, the current mobile office platform in the internet live: the main call is the native player, using the flv protocol. The browser side mostly uses HLS, which together constitute a relatively mature and perfect Internet live broadcast protocol system.
With the continuous upgrading of the construction of mobile infrastructure network and the advent of the era of mobile terminal and IOT / 5G, the demand for real-time communication interconnection is becoming increasingly strong. After flash is disabled, a more perfect alternative, H5 player, has emerged, and its technical specification is MSE. H5 player can now be supported by most PC browsers, and H5 can also play flv and other formats. MSE extension is similar to flash. It provides JS interface, which unpacks flv or HLS, then packages it as MP4 and sends it to MSE interface for playing. H5 is a standard alternative to flash, flv, HLS, dash can be played directly through MSE.
In addition, the other direction we are pursuing is low latency live broadcasting. The delay of general transport protocols can reach more than ten seconds, while RTMP can reduce the delay to 3-5 seconds. TCP on the public network sometimes jitters, and the delay will become larger at this time.
At present, we are exploring better ways to reduce the delay of live broadcast. In this regard, webrtc is recognized as an ideal solution. Although 5g can bring lower latency, availability is more important from the communication point of view. The popularity of 5g network means the stability of the whole network infrastructure, and more communication equipment can meet the corresponding requirements. For example, during the epidemic period, the number of users using live video shows a blowout growth, while the existing live web services still have no major downtime. This is mainly due to the construction of communication network infrastructure in the past decade, as well as the guarantee and progress of the entire open source environment, commerce, cloud computing and other fields.
At present, the development of SRT, IOT, etc. still need to face great challenges, especially now that the possibility of domestic Internet is constantly enriched, the ecological environment of live broadcast industry will also tend to be better, and new scenes emerge in endlessly.
Firstly, different scenarios have different requirements for network infrastructure and the whole business environment; Secondly, business and open source often promote each other. Business drives the continuous implementation of new open source solutions, and open source solutions also provide technical support for business; Finally, there is often a deep generation gap between different industries. For example, the monitoring industry often does not need apps, and the live broadcast industry does not use private protocols. We need SRT for long-distance transmission, gb28181 for Internet of things access, and webrtc for interaction and online communication.
We hope to have a set of open source solutions to meet the needs of low delay live broadcasting in different industries and scenarios. Now cloud computing has a trend of convergence, both CDN and cloud computing are gradually meeting the needs of online live broadcast.As developers, we need to have a server to support the Internet of these new video industries. Which open source solution can support the new business? What key capabilities or requirements does the program need to support?
If we want to implement this open source streaming server, we need to consider many key constraints and capabilities.
The first is that the platform needs to be scalable, that is, flexible enough. Internet business can be expanded from partial to large areas. If we use open source solutions, we need to clearly realize that if the business scale becomes larger, the existing resources and experience can support such a large-scale service operation, which requires the maintenance of many developers and the support of cloud manufacturers. Without the support of open source platforms and cloud vendors, we can only build our own platforms and deploy servers. For many enterprises, they can not have the ability and resources to carry out so many businesses, so open source solutions are very important.
The premise of open source is to have the support of cloud computing. The CDNs we can see now, including Alibaba cloud and Tencent cloud, all support RTMP, flv and HLS, and now they also start to support webrtc. On this basis, they have expanded and generated many commercial landing applications, with the ability of large-scale application. We can build our own platform based on the open source solution and connect it to the CDN to properly solve the elasticity problem. Without the blessing of cloud services, the value of open source platform is out of the question.
Low latency is the second thing we need to pay attention to. One of the main trends of video development is low latency. For example, the delay of TCP protocol can reach 3 ~ 5 seconds, which is not only caused by TCP protocol itself. For example, HLS slicing, player delay and encoding delay may be increased to 8-10 seconds or more. The delay of webrtc communication scenario is generally less than one second, even up to 400 milliseconds. In common speech communication scenarios, when the delay is more than 400 ms, it is necessary to synchronize the speech of two people manually.
The third point is that the service platform needs to have excellent usability. Such as red5, nginx-rtmp, crtmp, wowza, AMS, helix, etc. Another key is the interworking between protocols. A service may need to be based on multiple protocols, so it is very important to break through the gap. If you want to quickly deploy the program, the above three points are crucial.
1.1 internet live and Lianmai
The application scenarios of internet live broadcast and Lianmai are no stranger to you, and some of the technical details deserve our attention. For example, in terms of encoding and decoding, H.264 is relatively perfect, while PC and other devices have hardware encoding and decoding. In terms of commercial codec, such as hongruan in China and HaiVision abroad, some radio and television industries also have their own codecs. In addition to encoding and decoding, and further up, such as push stream OBS, ffmpeg are mainly integrated into the system. If the stream is pushed directly from the host, there are more OBS based modification schemes.
In terms of transmission, we need to distribute the content to many audiences. The open source solutions in this area include nginx-rtmp and SRS, and the commercial solutions include wowza and AMS. The commercial solutions are mostly distributed directly through the CDN network.
The player solution is mainly H5 player, most of the devices will integrate player to achieve encoding and decoding, of course, there are open source SDK to achieve this requirement. Live broadcast Lianmai is mainly realized through the cross function of RTC and webrtc.
1.2 Internet real time communication
The typical application example of Internet real-time communication is video conference, and the video codec is similar to the previous scene. However, in terms of audio coding and decoding, AAC is mostly used in internet live broadcasting, while opus is used in Internet real-time communication, because the delay of opus is lower. The client includes push stream and play, mainly webrtc framework. Push stream and play need server to distribute stream to many people. At this point, you will find that the server here is completely different from the live broadcast server mentioned above. It supports Janus, mediasoup, OWT, SRS, etc. There is a special application in the online conference scene, which is equal format. We hope to realize the interconnection between telephones. The open source solution for this part is freeswitch, which is a huge system in itself.
1.3 Internet Media Center
As a major application scenario, Internet media center is mainly used for content control. For example, when we need to record a video, we hope that the video can be watched repeatedly, such as a recorded training course. There are also some contents that will not be watched repeatedly, such as national day live broadcast and football match live broadcast. Here we need to design proper control over recorded contents, such as bad content identification and automatic editing. The design of media center is closely related to the content, which needs to include the whole process of blessing, such as transcoding, encoding, storage, etc. The traditional solution is to transmit the media stream to multiple CDNs or distribute the stream to multiple CDNs with the help of CDN. This scheme itself is a waste of resources. A better scheme is to set up a media center.
In terms of security, CDN also has broadcast authentication, such as limiting the number of participants, encrypting the content, etc. token is also an authentication method. In addition, we also need an access standard, such as gb28181 technical requirements for information transmission, exchange and control of security video surveillance networking system. Although it is a standard, it is very private, and CDN is not good to support it. Cloud computing CDN is more suitable to do standard things, infrastructure, distribution, etc. need standards to regulate. If the access protocol is very private, then building a media center is more in line with the needs of enterprises. It is relatively easier to realize Internet by converting content into standard protocol and sending it to CDN or other enterprises.
In special scenarios, such as long-distance transmission designed for transnational live broadcasting, most of the data is transmitted through special networks, or through the Internet or SRT. These special scenarios are strongly related to business and are not suitable to be framed by a unified standard. Their scale is not large enough to be a standard.
2. Scalability: Based on cloud or docking CDN
The figure above shows the deployment diagram based on cloud deployment or docking with CDN, which is how SRS demo network is deployed. It is mainly deployed with k8s or binary, including edge cluster, media center, source station, etc. The input of the stream will return the transport stream under the non-standard protocol and push the standard RTMP stream to the source station. Then, along the edge CDN, it is distributed through RTMP, flv and other standard protocols. If the scale is not large enough, it is directly broadcast and distributed from the cloud room. Even the slicing protocol can be distributed through nginx. Because data can be stacked to CDN, the system has scalability. The protocol is mainly through RTMP and also through CDN. CDN now supports webrtc and can be connected through RTC. However, there are many private things in RTC. In the future, RTC can go through CDN, but it will take some time to realize.
3.1 live streaming
As for delay, SRS now supports webrtc playback, and push streaming will soon be supported. The video picture above shows a clock. OBS grabs the picture of clock running. OBS itself has a delay of about 100 ms. by playing the clock running screen through RTMP and webrtc player, you can see that the clock indication number is obviously different, which also reflects the delay difference between the two.
3.2 real time streaming media system
Test gb28181. From the above experimental results, we can find that the delay of Hikvision monitoring intranet camera is 280 ms, that of Alibaba cloud server webrtc is 210 MS, and that of Alibaba cloud server RTMP is 1100 Ms. We can see that the delay of webrtc server is lower than that of Intranet surveillance camera, which is mainly because the delay is not simply a network problem. In this scenario, the delay of webrtc is lower than that of monitoring, and it has the ability to download scenarios. Most of the monitors have transmission routes. If the traditional scheme wants to play normally, it needs to install IE plug-in. And through the standard protocol, you want to see from the mobile phone, the mobile phone only needs to directly integrate the SDK, and the browser can also directly see the picture, so that you don’t need to install any plug-ins, and the streams of each camera can be seen.
4.1 Cloud Native
The third part is deployment. SRS supports k8s and docker deployment, including docker support for each new release. The figure above mainly shows how to deploy k8s. I will not repeat it here. You can watch the corresponding documents carefully.
In the past, we mainly used binary installation packages to deploy. In fact, SRS has multiple image warehouses, which can speed up code download. The warehouse is small, the download speed is fast, and it is relatively easy to compile, install, and start. In fact, the deployment mode of docker is easier. Recently, some friends have repeatedly reported that there are some problems in compiling, but docker can be deployed on any platform. For example, windows can fully deploy docker and run it, and arm can cross compile. Sometimes we need to solve many problems, but if we use arm’s docker compiler, there will be no problem. Because the environment of docker is unchanged, docker is to solve the environment, compilation and other problems in a unified way, including k8s, which can achieve non-stop service upgrade at the time of release, and release the new version when the business peak period is low.
4.2 error & log
The above figure shows the SRS log, in which the process number and ID exist. An ID represents a connection on the server. A server provides services for hundreds of users and processes. The ID is used to locate the location of the problem and the context log. Unlike HTTP, streaming media has context as a transport stream. The long-time data exchange makes its log not only one, but also the events in the middle will be presented through the log. Especially RTC has a lot of logs. How to extract key information from the server? In fact, SRS has designed a mechanism to know what each user’s log is and extract it in time.
In addition to the logs, the figure above also shows the error feedback in SRS. The error refers to the mechanism of go, because if there is an error in go, you can wrap the error, so that you can paste the corresponding log when you feedback the error, and you can know what the stack is. Under normal circumstances, when an error occurs, only one error code will be presented, and the developer does not know what happened. However, if there is a stack corresponding to the error and the variables of each stack are given, the process of querying and locating the error will become very convenient. Generally, when we focus on a new open source project, we don’t pay much attention to this problem. But when the problem occurs, we need to find the source of the problem, the role of the stack is very critical. This means that we can not only identify the source of the problem, but also solve it properly.
5. High performance
Performance is a basic requirement. In general, the performance of SRS is about twice that of other servers. In fact, the performance requirements in RTC will be more stringent, because RTC consumes more performance.
6. SRS development
As for the development of SRS, SRS has been developing steadily since 2013. In the initial stage, due to the relatively fixed application scenarios, the update strength is not large. Now, with the support of original cluster edge cluster, the coverage of live scenes is becoming more and more perfect, and RTC is also developing. As you can see, various video industries are becoming Internet-based recently, so the recent activity of SRS is also very high.
Around 2019, SRS forks will surpass nginx-rtmp, and the growth of SRS forks is expected to be twice that of nginx-rtmp in the future.
Looking back on the development of SRS, in 2013, v1.0 implemented the support of basic live broadcast protocols, such as RTMP and HLS. The following v2.0 mainly supports flv and other mobile Internet applications, while V3.0 provides support for original cluster and edge cluster for a long time. Edge cluster mainly deals with scenes played by many people. Original cluster is mainly used to support streaming, such as surveillance cameras. The edge does not store streams, while the original cluster stores streams, so clusters are needed. At present, the support for live scenes is relatively perfect.
In early 2020, SRS supports SRT, which is mainly used to solve long-distance transmission. It is also a comprehensive scene for live broadcasting and Internet broadcasting, such as some professional events, overseas live streaming, etc. The latest gb28181 and webrtc can be supported by SRS.
In the future, we need to meet a wider range of live Internet scenarios and requirements, such as supporting SFU, IOT, AI capabilities and cloud storage recording, security, MCU, SFU, AV1, sip, etc. We hope to have the ability to basically meet all the above scenarios in 2024.
Thank you for your outstanding contribution
The above image shows our existing online demo. Welcome to visit.