Discuss the integrated solution of IOS voice problem in online classroom



In the scene of online classroom, sound is one of the most important content transmission channels. To ensure the stability and reliability of sound is a very important part of the quality of online classroom. At the same time, many function modules in the online classroom are related to sound, so how to deal with the sound conflict between each module has become an important topic.


On the IOS side, when it comes to sound, we can’t get around avaudiosession. The function of avaudiosession is to manage the allocation of audio, the only hardware resource. By tuning the appropriate avaudiosession, we can adapt the functional requirements of our app for audio. When switching audio scenes, you need to switch avaudiosession accordingly.


The main audio scenes used in educational scenes are as follows:


IOS provides avaudiosessionmode [1] for use with avaudiosessioncategory [2]. The audio modes used in educational scenes mainly include:


We can use options to fine tune category behavior

Call volume and media volume

Generally speaking, call volume refers to the volume of voice and video calls. Media volume refers to the volume of sound effect and background sound of playing music, video or game.

In practical use, the difference between the two is that the call volume has better echo cancellation, and the media volume has better voice expression. The media volume can be adjusted to 0, but the call volume cannot.

There is only one choice between call volume and media volume, so it is necessary to distinguish between call volume and media volume. When the volume of the system is adjusted on the device, the call volume is adjusted. The same goes for media volume. Media volume and call volume belong to two different and independent systems respectively. One setting will not affect the other.

After entering the call, the playback volume of the sound effect is controlled by the call volume. After exiting the call, it is controlled by the media volume. Generally, in the educational scene, when the students pull the stream as the audience, they use the media volume, and the teacher’s voice is more three-dimensional and full. When the students connect the microphone, they use the call volume to ensure the quality of the call sound.

In short, media volume control will be used in non continuous mode, and call volume control will be used in continuous mode. They have independent volume control mechanism.

When playing media resources, use player (such as avplayer) to play audio. The description of audiounit in the bottom layer of player isVoiceProcessingIO

An audiounit is maintained in RTC SDK. The description of audiounit isRemoteIO, at media volumeVoiceProcessingIOWhen mode switching occurs, the original audiounit will be destroyed, and then a new audiounit will be created. An audiounit will always be kept for audio playback.

Under the volume of the call, avplayerVoiceProcessingIOThe audiounit sound will be suppressed. Similarly, under the media volume, the audiounit description in the RTC SDK is set toVoiceProcessingIOIf other modules switch to call volume by setting avaudiosession, RTC voice will also be suppressed.

Industry status

In the online classroom scene, many functions need to play sound, including in class audio and video live broadcast, after class playback, WebView embedded courseware sound (including audio, video, sound effect), classroom audio, classroom video, classroom game sound, sound effect sound, etc. In addition, the classroom also includes many functions that need voice recording, including connecting the microphone, following reading, group speaking, chat voice input, voice recognition, etc.

There are various combinations of these functions in the classroom, and there are differences in the setting requirements of avaudiosession, and avaudiosession is a single example. Without a unified management logic, it is easy to have the problem of setting confusion.

At present, the main problem encountered in the industry is that RTC voice and media voice can not be heard.

No RTC sound

The main reason for not hearing RTC is that other functions are not included in avaudiosession options when setting avaudiosessionAVAudioSessionCategoryOptionMixWithOthersMix mode, causing RTC sound to be interrupted by high quality process. For example, the embedded audio of WebView is played in non mixing mode. Because WebView uses the system process to play the sound with the highest priority, the RTC sound of APP process will be suppressed, resulting in the failure of normal voice.

This kind of problem is generally hidden, because if there is a problem in a simple scenario, it can be tested before going online. When multiple functional scenarios are connected, it is often difficult to find the problem during the test. Moreover, if there is no complete log query system on the line, it is very difficult to find such problems on the line, It is often left over for a long time because of the lack of positioning.

Media voice suppressed

In the call volume mode, the media voice will be suppressed, resulting in a decrease in the voice. The more common scenario is that in the small class scenario, when students play classroom audio and video and other media resources during streaming, the sound will be smaller than that of RTC, resulting in unclear media sound.

The reason is that the IOS mobile phone system will turn on echo cancellation to ensure the human voice experience. Therefore, the voice of the media channel and the background sound effect will be suppressed.

Some head apps in the education industry have not fundamentally solved the problem. Many of them evade the problem from the functional level of the product, and compromise for technical problems through product compromise. For example, when playing classroom audio and video resources, all students are forced to turn off the microphone by default. When the microphone is turned off, the students are in the media volume, so there is no problem of being depressed. When the classroom audio and video is finished, students are allowed to turn on the microphone. This way of solving problems by avoiding problem scenarios is not replicable.

RTC sound becomes smaller

The main reason of RTC sound decreasing is that the sound is produced through the earpiece, but not through the loudspeaker, which causes the false appearance of sound decreasing. In addition, in the IOS 14 system, after using the RTC call mode and switching back to the media mode, it can be called againsetCategory:PlayAndRecord + DefaultToSpeakerThere will be a problem of low voice.


In view of the above industry pain points, through the analysis of the underlying principles and practical project experience, a set of feasible solutions are sorted out from the code specification, problem disclosure and problem alarm.

No RTC sound heard, RTC sound reduced

The basic reason for the sound problem of RTC is that other module functions have changed the avaudiosession, and after the function ends, the avaudiosession has not been reset to the settings required by RTC. The audio and video SDK itself (such as Agora, zego, etc.) will have some logic to cover this situation, but if this cover is invasive, it is unreasonable, so it has some limitations.

Audiosession modification specification

Because the system can’t distinguish which module has changed the audiosession in the same process, in order to avoid the problem of not hearing the RTC sound, when using RTC, other modules need to follow the following principles when calling and changing the audiosession:

  1. Module callsetCategoryBefore judging, if the current audiosession has met the use needs, there is no need to set it again to avoid triggering IOS 14 system bug
  2. When the module needs to record, the category should use playandrecord (in order to prevent interrupting the playing audio, do not use the only recorded category record), and call it again when the current category is not playandrecordsetCategory
  3. When the module only needs to play, it does not need to play when the current category is playandrecord or playback or ambientsetCategory
  4. If the current category does not meet the requirements of the module, thesetCategoryYou should save the current audiosession state before you startsetCategory, use the audio function, after use, should be resetsetCategoryReturn to the previous audiosession state
  5. When setting audiosession, the category options should includeAVAudioSessionCategoryOptionDefaultToSpeakerAndAVAudioSessionCategoryOptionMixWithOthersThe IOS 10 system and above should also includeAVAudioSessionCategoryOptionAllowBluetooth

The core code is as follows:

//When recording is required, the setting code of audiosession is as follows:
if ([AVAudioSession sharedInstance].category != AVAudioSessionCategoryPlayAndRecord) {
            [RTCAudioSessionCacheManager cacheCurrentAudioSession];
            AVAudioSessionCategoryOptions categoryOptions = AVAudioSessionCategoryOptionDefaultToSpeaker | AVAudioSessionCategoryOptionMixWithOthers;
            if (@available(iOS 10.0, *)) {
                categoryOptions |= AVAudioSessionCategoryOptionAllowBluetooth;
            [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayAndRecord withOptions:categoryOptions error:nil];
            [[AVAudioSession sharedInstance] setActive:YES error:nil];
//Reset audiosession at end of function
[RTCAudioSessionCacheManager resetToCachedAudioSession];
static AVAudioSessionCategory cachedCategory = nil;
static AVAudioSessionCategoryOptions cachedCategoryOptions = nil;
@implementation RTCAudioSessionCacheManager
//Change the settings of RTC cache before audiosession
+ (void)cacheCurrentAudioSession {
    if (![[AVAudioSession sharedInstance].category isEqualToString:AVAudioSessionCategoryPlayback] && ![[AVAudioSession sharedInstance].category isEqualToString:AVAudioSessionCategoryPlayAndRecord]) {
    @synchronized (self) {
        cachedCategory = [AVAudioSession sharedInstance].category;
        cachedCategoryOptions = [AVAudioSession sharedInstance].categoryOptions;
//Reset to cached audiosession settings
+ (void)resetToCachedAudioSession {
    if (!cachedCategory || !cachedCategoryOptions) {
    BOOL needResetAudioSession = ![[AVAudioSession sharedInstance].category isEqualToString:cachedCategory] || [AVAudioSession sharedInstance].categoryOptions != cachedCategoryOptions;
    if (needResetAudioSession) {
        dispatch_async(dispatch_get_global_queue(0, 0), ^{
            [[AVAudioSession sharedInstance] setCategory:cachedCategory withOptions:cachedCategoryOptions error:nil];
            [[AVAudioSession sharedInstance] setActive:YES error:nil];
            @synchronized (self) {
                cachedCategory = nil;
                cachedCategoryOptions = nil;

Bottom up strategy

Considering the complexity of the online classroom scene, all the function codes in the classroom follow the modification specification of avaudiosession. Although there is a strict codereview, there is also a certain risk of human factors. With the continuous iteration of business functions, it is impossible to completely guarantee that there are no problems online. Therefore, a set of reliable bottom-up strategy is very necessary.

The basic logic of the strategy is the change from hook to avaudiosession. When the setting of avaudiosession for each module does not meet the requirements of the specification, we force the correction without affecting the function, such as adding the mixing mode to the options.

Through method exchange, we can hook the changes to avaudiosession. For example  kk_setCategory:withOptions: error:  Integration with system  setCategory:withOptions: error:  Exchange. In the exchange method, we judge whether options include  AVAudioSessionCategoryOptionMixWithOthersIf it is not included, we will add it.

- (BOOL)kk_setCategory:(AVAudioSessionCategory)category withOptions:(AVAudioSessionCategoryOptions)options error:(NSError **)outError {
    //In the scenario where the audiosession needs to be modified (RTC live), if the options are modified without mixwithother, the mixwithother is added to the options
    BOOL addMixWithOthersEnable = shouldFixAudioSession && !(options & AVAudioSessionCategoryOptionMixWithOthers)];
    if (addMixWithOthersEnable) {
        return [self kk_setCategory:category withOptions:options | AVAudioSessionCategoryOptionMixWithOthers error:outError];;
    return [self kk_setCategory:category withOptions:options error:outError];

However, the above method is only applicable to thesetCategory:withOptions: error:To set the audiosession valid, if calledsetCategory:error:To change the audiosession, it will cause the call loop problem. In the underlying implementation of IOS, callsetCategory:error:The internal will call againsetCategory:withOptions: error:Method, because of the method exchange, there is a nested call problem.

To solve this problem, we use monitoringAVAudioSessionRouteChangeNotificationNotice, come onhookcategoryChanges in,AVAudioSessionRouteChangeNotificationWhen callingsetCategory:error:Is triggered instead of callingsetCategory:withOptions: error:This method is complementary to the above methods.

//Add listening to avaudiosessionroutechange
[[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(handleRouteChangeNotification:) name:AVAudioSessionRouteChangeNotification object:nil];
- (void)handleRouteChangeNotification:(NSNotification *)notification {
  NSNumber* reasonNumber =
  AVAudioSessionRouteChangeReason reason =
    if (reason == AVAudioSessionRouteChangeReasonCategoryChange) {
        AVAudioSessionCategoryOptions currentCategoryOptions = [AVAudioSession sharedInstance].categoryOptions;
        AVAudioSessionCategory currentCategory = [AVAudioSession sharedInstance].category;
        //In the scenario where the audiosession needs to be modified (RTC live broadcast), if the options do not include mixwithother when modifying the category, the mixwithother is added to the options
        if (shouldFixAudioSession  && !(currentCategoryOptions & AVAudioSessionCategoryOptionMixWithOthers)) {
            [[AVAudioSession sharedInstance] setCategory:currentCategory withOptions:currentCategoryOptions | AVAudioSessionCategoryOptionMixWithOthers error:nil];

Alarm mechanism

Even if there is a guarantee to modify the specification and the bottom-up strategy, with the classroom business iteration and the IOS system upgrade, there is no guarantee that there will be no problems on the line. Therefore, we have established a problem alarm mechanism. When there are problems on the line, we can receive the alarm in time in the work group. According to the problem information of the alarm, we can further investigate the problems through the log. Through the alarm mechanism, we can more quickly respond to online problems, not passively rely on students’ complaint feedback, and promote the problem solving as soon as possible.

When the RTC voice is interrupted, the underlying audio and video SDK will call back the warning error code (for example, Agora’s warning code is 1025). When the corresponding warning code appears, it will be synchronized in the form of messages in the flybook group in combination with the alarm function of the Slardar. At the same time, when the hook changes to the avaudiosession, by obtaining the stack information, we can locate which module triggered the change. Combined with the alarm user information, we can more easily locate the problem.

Media voice suppressed

When the media sound is played at the media volume, it is switched to the call volume due to the connection of the microphone. At this time, due to the characteristics of the system, the media volume will be suppressed by the call volume and the sound will become smaller.

To solve this problem, we use the audio and video SDK to provide mixing, mixed stream function to avoid. The basic principle is that when playing the media resource, we get the PCM audio data of the resource, throw the data to the audiounit of RTC for mixing, and the RTC audio playback unit plays it uniformly. If the RTC uses the call volume at this time, the media resource also uses the call volume to play, and vice versa. In order to ensure that the media resources and RTC always maintain a unified volume control mechanism, and avoid the difference of sound size.

Mixing refers to the local file path to the audio, or the URL to play, which is read and played by the SDK. Mixed stream refers to the pointer to the video file. The player only decodes and plays the video data, and throws the audio data to the SDK in real time. The SDK mixes and plays the incoming real-time audio data with RTC audio data. In this project, we use the on-demand SDK ttvideoengine to realize video playback and audio out casting.


Through the above-mentioned comprehensive solutions, the sound problem has been effectively solved. At the same time, it can calmly cope with the rapid iterative classroom needs, and effectively improve the online classroom experience.

So far, this article about the discussion of the integrated solution of IOS end voice problem in online classroom is introduced here. For more related content of IOS end voice solution in online classroom, please search previous articles of developer or continue to browse the following related articles. I hope you can support developer more in the future!

Recommended Today

Linux search log related commands

Reference website:http://www.liyblog.top/p/151. After printing in pages, click the space bar to turn pages (- N means to display line number, more means to display in pages) cat -n laravel-2019-03-01.log | grep ‘local.ERROR’ | more 2. Search the contents of the log file according to the time. The log at 12:00 on March 1, 2019 is […]