As a toolbox, webrtc has the advantages of security, good compatibility and network enhancement compared with the traditional video conference.
Author / Eric Rescorla
The wide availability of high-quality video conferencing is one of the real successes of the Internet. Of course, the concept of video conference has a long history (you can see the scene of Heywood Floyd making a video call to his family with his bell videophone in 2001), but until recently, it still needs special equipment or at least download special software. Simply put, webrtc is a video conference (VC) in a web browser without downloading: you just need to visit a website and make a call. Most major VC services have webrtc versions: including Google meet, Cisco WebEx, Microsoft teams, and a lot of small companies.
It’s a toolbox, not a cell phone
Webrtc is not a complete video conference system; It is a set of built-in tools in the browser. It can solve many difficulties in building VC system, so you don’t have to do it again. These tools include:
- Capture audio and video from your computer’s microphone and camera. This also includes the so-called acoustic echo cancellation: people can eliminate echoes even without headphones (I hope so).
- Allow two endpoints to negotiate their capabilities (e.g. “I want to send and receive 1080p video with AV1 codec”) and agree on a common set of parameters.
- Establish a secure connection between you and others on the call. This includes obtaining data through any NAT or firewall on the network.
- The audio and video are compressed and transmitted to each other, and then reorganized after receiving. In addition, you also need to deal with the loss of some data. In this case, you should avoid affecting freeze frame or hearing audio faults.
This function is embedded in the so-called application programming interface (API): the programmer provides a set of commands to the browser to set up video calls. As a result, you can write a very basic VC system with a few lines of code. Building a production system is troublesome, but with webrtc, the browser completes most of the work of building the client for you.
Importantly, these functions are fully standardized: the API itself is published by the World Wide Web Consortium (W3C), and the network protocols (encryption, compression, NAT traversal, etc.) are standardized by the Internet Engineering Task Force (IETF). The result is a host of specifications, including API specifications, protocols for negotiating what media to send or receive, and mechanisms for sending point-to-point data. All in all, this represents a lot of work done by many people over the past decade, resulting in hundreds of pages of specifications.
As a result, you can create a VC system suitable for everyone in the browser without installing any software.
Ironically, the actual release of the standard is a bit of a tiger’s head and mouse’s Tail: every mainstream browser has released webrtc for many years, and as I mentioned above, there are a large number of webrtc VC systems. This is a good thing: extensive deployment is the only way to gain confidence that the technology does work as expected and the documentation is clear enough to be implemented. These standards reflect the collective judgment of the technical community, that is, we have a normal and effective system, and we will not change the basic parts. This also means that for VC suppliers implementing non-standard mechanism, it is time to update according to the requirements of the standard.
Why do you care about this?
Then you might think, “well, you’ve all done a lot of work, but what does it matter? Can’t I download zoom directly? There are several important reasons why webrtc has a great beginning.
Perhaps the most important reason is security. Because webrtc runs completely in the browser, this means that you don’t need to worry about the security problems in the software that VC providers want you to download. For example, last year zoom had many notable security vulnerabilities, such as allowing websites to add you to calls without permission, or installing so-called remote code execution attacks, allowing attackers to run their code on your computer. In contrast, because webrtc does not need to be downloaded, you will not be exposed to any vulnerabilities that may exist in the vendor’s client. Of course, browsers don’t have a perfect security record, but every major browser has invested a lot of money in security technology, such as sandboxing. In addition, you are already running a browser, so each additional application will increase the security risk. To this end, Kaspersky recommends running the zoom web client, even if the experience is much worse than the application.
The second security advantage of webrtc based meetings is that browsers control access to cameras and microphones. This means that you can easily prevent sites from using them and determine when to use them. For example, before you let your site use cameras and microphones, Firefox will prompt you, and then display their runtime content in the URL bar.
Webrtc is encrypted all the time in the transmission process, and you don’t need the VC system to do other things, so you don’t have to ask whether the supplier’s encryption work is good or not. This is one of the most involved parts of Mozilla in webrtc and is in line with principle 4 of Mozilla declaration (personal security and privacy on the Internet are basic and cannot be regarded as optional). Even more exciting, we are beginning to see built-in end-to-end encrypted conferences built for webrtc on MLS and sframe. This will help solve a major security feature that some native clients do not provide: prevent the service from listening to your calls. I am glad to see progress in this regard.
Because video calling applications based on webrtc can work on standard web browsers, they can significantly improve compatibility. For users, this means they can join a call without installing anything, which makes life much easier. I’ve attended many conference calls, and some people can’t join – usually because their company uses different VC systems – because they don’t download the correct software, but now this situation is much less, because it only needs to be used with the browser. This may be a bigger problem in enterprises with software installation restrictions.
For those who want to support a new VC service, webrtc means that there is no need to write a new client software and let people download it. This makes it easier to enter the market without worrying that users are locked in a VC system and can’t use your system.
This doesn’t mean you can’t build your own client. Many popular systems, such as WebEx and meet, have downloadable endpoints (or for WebEx, you can buy hardware devices). But this means that you don’t have to. If you do well, browser users will be able to talk to your custom terminal without too much investment. It provides a simple way for ordinary users to try your service.
Because webrtc is part of the web, not a separate application, this means that it can be used not only for conference applications, but also to enhance the web itself. Do you want to add an audio stream to the game? Share your screen in webinar? Upload video from your camera? No problem, just use webrtc.
The exciting thing about webrtc is that there are many web applications that can use webrtc in addition to video calls. Perhaps the most interesting is the use of webrtc “data channels”, which allows a pair of clients to establish a connection between them, and they can use this connection to exchange data directly. There are many interesting applications, including games, file transfer, and even BitTorrent in browsers. I think it’s too early for us to see a lot of channels in the future.
Webrtc itself is a big step forward for the Internet: if you told people 20 years ago that they would make video calls from their browsers, they would laugh at you. I have to admit that I was skeptical at first, but I did it almost every day at work. But more importantly, this is a good example, which shows that the power of the network can make people’s lives better, and what we can do when we work together.
1. Technical points: for Firefox users, the biggest problem may be that people have implemented a chrome specific mechanism to process multimedia streams, which is called “plan B”. IETF finally adopted something called “unified plan”, which is also supported by chrome (just like Google meet), but there are still some services, such as slack and Facebook video calling. They only use plan B, which means that they can’t work normally with Firefox that implements the unified plan.
2. Zoom web client is an interesting example because it only has part of webrtc. Unlike, for example, Google meet, zoom Web uses webrtc to collect audio and video and transmit media on the network, but uses webassembly to complete all audio and video locally. This proves the power of webassembly, but if you compare zoom web with other clients such as meet or jitsi, you will find the advantage of using the browser’s built-in webrtc API. ︎
3. Google has opened their webrtc protocol stack, which makes it easier for you to write your own downloadable client, including a client that will interoperate with the browser. ︎