How does the new azure communication service (ACS) implement webrtc?

Time:2021-5-11

This article is from Gustavo Garcia, a software engineer of housepat. He has made a comprehensive evaluation of azure communication service (ACS), including browser compatibility, codec and bandwidth estimation algorithm. Compared with his main competitors, there is still a gap in maturity.

By Gustavo Garcia

Translated by Helen Lyu

Original link/https://webrtchacks.com/how-d…

We have a long tradition of analyzing the main services that use webrtc. After the success of Web instant messaging, we can’t keep up with the growth of the list. Fortunately, Gustavo Garcia Bernardo, one of our favorite authors, recently found time to review the new Microsoft azure communications service. He found some interesting results and we’re happy to show them here. Gustovo has deep professional experience in real-time communication and has been closely involved since webrtc was founded.

Every time a $1.6 trillion company launches a product, it’s usually a big deal, especially for those who regularly handle communication APIs. Microsoft and webrtc have a long and unique history, so we would like to know how webrtc can be used as part of this new product.

How does the new azure communication service (ACS) implement webrtc?

As you can see, there are also some interesting features. A few weeks ago, Microsoft announced the azure communications service (ACS). This new product in their cloud services catalog provides chat, SMS, PSTN calls and video communications.

It competes with Vonage, twilio, Agora and other major players in the communication platform as a service (cpaas) category, and with zoom or Amazon’s video API products. This Microsoft product is not much different from its competitors. This article will focus on voice and video. These are based on webrtc.

As you can see in the details shown later, it reuses a large portion of the existing Microsoft infrastructure (from Skype and / or Microsoft teams). At a higher level, there are two APIs:

  1. Management API – includes server-side SDK for creating users and access tokens
  2. CLIENT SDK – suitable for web, Android and IOS, can connect the endpoint to the communication server to send and receive audio / video / screen sharing and media from PSTN and Microsoft teams.

How does the new azure communication service (ACS) implement webrtc?

API and the functions it provides

There are two basic primitives in the client API: call and room. Using the call interface, you can call any other user connected to the system. Using the room primitive, you can join a room( Client API) has stronger support for identity and calling than other platforms, which may be because the infrastructure is reused and the function provides integration with teams platform.

The lack of room access is interesting because if you know the room ID, each access token obviously has the right to join each room.

On the client side, in addition to some audio and video device management APIs, basic call control operations (mute / unmute, hold / unhold, screen sharing) are provided to simplify the system configuration.

Webrtc compliance

To conclude, let’s compare the differences between azure and webrtc standards (W3C or IETF drafts) in this case

How does the new azure communication service (ACS) implement webrtc?

CLIENT SDK

The client SDK is suitable for web, IOS and Android. Currently, browser support is limited. It includes only chrome, some limited support for Safari (receive only), and a new chrome based edge based on windows only.

How does the new azure communication service (ACS) implement webrtc?

When testing the web and Android SDKs, it’s worth noting that they still need to be improved. For example, the browser log shows a very lengthy console, as well as common warnings about statistics or some request failures, although this was expected for the first release.

Server side management SDK

Microsoft provides a management SDK for creating users and tokens to support C #, python, Java and node.js. These SDKs will run in trusted applications and need access keys created in the azure console. Microsoft gets bonus points by supporting primary access key and secondary access key to support access key rotation.

Other features

Other advanced functions:

  1. PSTN call: private preview does not allow us to test this, but according to the document (described in it), it supports 1:1 call and group call.
  2. SMS – as mentioned above, we can’t test this, but sending and chatting are also part of azure communication products.
  3. Teams integration: This is also a function of private preview. However, with the popularity of today’s teams products, this new communication platform may receive initial attention. This is a use case.

There is no record or broadcast function mentioned in the document or SDK, and there is no integration with azure stream processing functions (such as text to voice or visual API).

to signal to

Signaling is based on HTTP requests.

One can see many references to the Skype domain in the signals that show how to use the product on top of other existing parts of the Microsoft ecosystem.

In fact, even the user identifier in the JWT token of azure comms services is called skyheads:

How does the new azure communication service (ACS) implement webrtc?

Here is an example of HTTP based custom JSON format proprietary signaling when you mute / unmute the microphone:

How does the new azure communication service (ACS) implement webrtc?

For 1:1 call, the system uses direct P2P webrtc connection. In “room” mode, ACS uses SFU to forward audio and video packets between different participants. These sfus are located in different areas. As far as I’m concerned (in Europe), I was assigned to one (SFU) in Dublin during the exam.

SDP and media

Peer to peer connection plan

The client SDK uses a single webrtc peerconnection to send and receive multiple streams. This is the most efficient and modern mechanism, but not all platforms use it. The disadvantage is that it uses the original plan-b semantics instead of the new unified plan semantics. Considering the existence of plan-b, this is not atypical( To this day, many of the largest multi-party applications still use plan-b.

Interactive connection establishment (ice)

In terms of media connection, ACS uses both stun and turn TCP servers.

Surprisingly, turn TLS is not included – which may limit ACS’s ability to connect in a restricted enterprise environment.

http://localhost:5000/, { iceServers: [turn:52.158.34.11:3478?transport=udp, turn:52.158.34.11:443?transport=tcp], iceTransportPolicy: all, bundlePolicy: max-bundle, rtcpMuxPolicy: require, iceCandidatePoolSize: 0, sdpSemantics: “plan-b” }, {advanced: [{enableDtlsSrtp: {exact: false}}, {googCpuOveruseDetection: {exact: false}}]}

To connect directly to the SFU, it uses a typical ice UDP candidate, but also the ice TCP candidate on port 3478. Ice support is not ice Lite, but full ice. In SFU with public IP, this is not very common, because it is difficult to implement. Full ice doesn’t offer many advantages, but it doesn’t have any negative effects.

encryption

The encryption required by webrtc is based on SRTP. However, SFU / room key exchange uses SDEs instead of the standard dtls protocol. This is relatively simple and can provide faster setup speed, but only supported by chrome. Since SDEs is explicitly prohibited by the standard because it is not as secure as the standard dtls, it may be removed at some point.

Codecs

G. 722 is used for audio codec. This is not common for webrtc platforms, but it’s not surprising given the need for PSTN interoperability and the reuse of existing Microsoft infrastructure. This is part of the SDP answer with audio channel information:

m=audio 3480 RTP/SAVPF 9 0 8 13 101

c=IN IP4 40.113.83.182

a=rtpmap:9 G722/8000

a=rtpmap:0 PCMU/8000

a=rtpmap:8 PCMA/8000

a=rtpmap:13 CN/8000

a=rtpmap:101 telephone-event/8000

The video codec selected in H.264.

Video codec selected in H.264. It uses RTX retransmission to ensure reliability. ACS does not include broadcast support to adapt video quality to the needs of different participants in the conference room. Also, at least in the example I tested, the bit rate was very low. You can see how to configure it to use H264 at 200kbps in the next capture of the sender parameter.

How does the new azure communication service (ACS) implement webrtc?

RTCP

Other details at the RTP / RTCP level are the use of bundle, RTCP MUX and RTCP rsize in most platforms. It also reserves 50 SSRC for each flow (1501, 1551…) and pre allocates eight remote flows for future participants in the remote SDP during the initial establishment of the call.

Bandwidth estimation (BWE)

For bandwidth estimation, it uses receiver support (based on remb) instead of more modern and optimized sender bandwidth estimation (based on transmission feedback).

Other unidentified things

There are also nonstandard extensions in SDP. I doubt if they will have an impact and may inherit from other applications. For example:

a=x-mediabw:applicationsharing-video send=8000;recv=8100

a=x-source-streamid:19

a=x-signaling-fb:* x-message app send:dsh

a=x-signaling-fb:* x-message app send:src,csrc,vc recv:src

conclusion

Azure communication services has a simple API. Everything worked as expected and was easy. The documentation is good, and the interactive examples are really helpful. It also guarantees an easy to understand and competitive pricing model. On the other hand, it’s still a beta product, and it won’t be as mature as competitors have been offering for years. If ACS is to be seriously considered, Microsoft must extend its support to other browsers and remove the existing web support

In addition, the lack of some video quality technologies (mainly simulcast) and the lack of support for newer codecs (especially OPUS) are unexpected. I hope the upcoming version of Microsoft can solve these problems.

For many popular use cases, the lack of records is also a big gap. In my opinion, the most promising part is the potential integration with azure ecosystem, such as push notification, text to voice conversion, computing, publish / subscribe… For example, having publish / subscribe support for audio / video will be very useful, but it is only applicable to SMS at present.

I’m also looking forward to what people can build with teams integration, but I can’t evaluate that in these tests.

Recommended Today

Design of multiple login users in ABP

scene In the “school management system”, students, parents, teachers, educational administration may log in and do some of their own operations. These users need different attributes. For example, students have student numbers, but teachers don’t. Application users When coding, you often need to get the information of the current login user, which is the application […]