Core under Windows_ Audio_ Introduction to the use of APIs


After Windows Vista, the audio system has a great change compared with the previous system, and a new set of underlying API, core audio APIs, has been produced. The low-level API provides services for high-level APIs, such as media Foundation (which will replace high-level APIs such as DirectShow). The system API has the characteristics of low delay, high reliability and security.

This paper mainly introduces the use of the API from the real-time audio and video scene.

The composition of core audio APIs:MMDeviceEndpointVolumeWASAPIEtc. For real-time audio and video system, the main use isMMDeviceandEndpointVolumeThese two sets of APIs. Its position in the system is as follows:

Core under Windows_ Audio_ Introduction to the use of APIs

My use of audio devices in real-time audio and video can be divided into:

1. Equipment list management

2. Device initialization

3. Equipment function management

4. Data interaction

5. Volume management

6. Device terminal monitoring

Next, we introduce the implementation of related functions

1. Equipment list management

Management of audio devices byMMDevice APITo achieve.

First, we’re going to create oneIMMDeviceEnumeratorObject to start calling related functions.

IMMDeviceEnumerator* ptrEnumerator;

CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), reinterpret_cast<void**>(&ptrEnumerator));

And passedIMMDeviceEnumeratorCan achieve: get the system default deviceGetDefaultAudioEndpoint, get device collectionIMMDeviceCollection, get the specified deviceGetDeviceRegister device monitoringIMMNotificationClient(monitoring device plugging and status change).

Through these methods, we can get the default device of the system, traverse the device list, open the specified device and monitor the device change. In this way, the functions related to equipment management in real-time audio and video are realized.

2Device initialization

The startup of audio equipment is an important node for the reliability of the whole audio module. According to the device type and data acquisition method, we can divide into three types of devices: microphone acquisition, speaker playback, speaker acquisition.

First, we need oneIMMDeviceObject, which can be obtained from the related functions of device management.

IMMDevice* pDevice;


ptrEnumerator->GetDefaultAudioEndpoint((EDataFlow)dir, (ERole)role/* eCommunications */, &pDevice);

//Get by path

ptrEnumerator->GetDevice(device_path, &pDevice);


pCollection->Item(index, &pDevice);

Re passIMMDeviceobtainIAudioClientThe format setting and initialization of the device are passedIAudioClientObject implementation. Generally, it is opened in shared mode, in which microphone acquisition and speaker broadcast use event driven mode to process data, while speaker acquisition uses loopback mode to drive data processing. A simple example is as follows:

//mic capturer









//playout render








//playout capturer








amongWfxIt is the device format parameter. Generally, in order to ensure the availability of the equipment, the default format (viaIAudioClient::GetMixFormatIf you need to use a custom format, you can use theIAudioClient::IsFormatSupportedMethod to traverse the device support format.

3Equipment function management

For microphone equipment, we usually need to process its data. Some hardware devices and systems support the functions of noise reduction, gain and echo cancellation. However, under the general windows system, the devices are complex and uncontrollable, and most of them use software algorithms. If we need to test whether the equipment uses the processing function and related parameters, we need to useTopologyThe function of the module.

IDeviceTopology* pTopo;

pDevice->Activate(__uuidof(IDeviceTopology), CLSCTX_INPROC_SERVER, 0,&pTopo);

adoptIDeviceTopologyWe can traverseIConnectorObject, getIAudioAutoGainControlIAudioVolumeLevelEqual ability object, and deal with related ability.

be careful:IConnectorIt may be a loop nested in the traversalIConnectorOfIPartIt is necessary to distinguish the member objects when theIPartType of.

4Data interaction

When we initialize the device, we choose different modes according to the device. Different devices have different data drives in their own modes

Microphone acquisition:

Core under Windows_ Audio_ Introduction to the use of APIs

Speaker play:
Core under Windows_ Audio_ Introduction to the use of APIs

Speaker acquisition:
Core under Windows_ Audio_ Introduction to the use of APIs

In the data interaction with the device, we need to obtain the corresponding service object to obtain the device data according to the data acquisition mode. It is used in the collection partIAudioCaptureClientThe service is used to obtain device data and playIAudioRenderClientThe service gets the device data input pointer. Examples are as follows:


IAudioCaptureClient* ptrCaptureClient;//audioin or audioout

ptrClient->GetService(__uuidof(IAudioCaptureClient), (void**)&ptrCaptureClient);

{//work thread

//Wait Event


&pData, // packet which is ready to be read by used

&framesAvailable, // #frames in the captured packet (can be zero)

&flags, // support flags (check)

&recPos, // device position of first audio frame in data packet

&recTime); // value of performance counter at the time of recording

//pData processing




IAudioRenderClient* ptrRenderClient;//audioout

ptrClient->GetService(__uuidof(IAudioRenderClient), (void**)&ptrRenderClient);

{//work thread

BYTE* pData;//form buffer

UINT32 bufferLength = 0;


UINT32 playBlockSize = nSamplesPerSec / 100;

//Wait Event

UINT32 padding = 0;


if (bufferLength – padding > playBlockSize)


ptrRenderClient->GetBuffer(playBlockSize, &pData);

//request and getdata

ptrCaptureClient->ReleaseBuffer(playBlockSize, 0);



In the actual data interaction, it is necessary to open a separate thread for processingGetBufferandReleaseBuffer。 The microphone acquisition and speaker play are driven by device event, and the event handle of response can be set after the device initialization(IAudioClient::SetEventHandle)。

In the whole audio and video system, the device data thread also needs to count the data processing time, collect and play buffer size, monitor and check the device status and calculate the AEC delay.

5Volume management

General volume management only processes the volume of the current device after the device is selected, so it is generally usedIAudioEndpointVolumeThrough the device objectIMMDeviceobtain:

IAudioEndpointVolume* pVolume;

pDevice->Activate(__uuidof(IAudioEndpointVolume), CLSCTX_ALL, NULL, reinterpret_cast<void**>(&pVolume));

obtainIAudioEndpointVolumeObject, we can handle the volume control of the current device:


pVolume->SetMasterVolumeLevelScalar(fLevel, NULL);

Mute control:

BOOL mute;


pVolume->SetMute(mute, NULL);

And registrationIAudioEndpointVolumeCallbackMonitor volume status:

IAudioEndpointVolumeCallback* cbSessionVolume;//need to do


6Device terminal monitoring

In the process of operation, in addition to the plug-in and other operations of the device, there may be some attribute changes, etcIAudioSessionEventsmonitor:

IAudioSessionControl* ptrSessionControl;

ptrClient->GetService(__uuidof(IAudioSessionControl), (void**)&ptrSessionControl);

IAudioSessionEvents* notify;


The callback monitor can monitor the connection status and name change of the device.

Some precautions:

1. Thread priority

In the actual engineering development process, we need to process the audio thread. Usually by calling the system moduleAvrt.dll, dynamically call the function under it, calling the thread and the specified task(Pro Audio)Related. Upper Code:

Function binding:

avrt_module_ = LoadLibrary(TEXT(“Avrt.dll”));

if (avrt_module_)


_PAvRevertMmThreadCharacteristics = (PAvRevertMmThreadCharacteristics)GetProcAddress(avrt_module_, “AvRevertMmThreadCharacteristics”);

_PAvSetMmThreadCharacteristicsA = (PAvSetMmThreadCharacteristicsA)GetProcAddress(avrt_module_, “AvSetMmThreadCharacteristicsA”);

_PAvSetMmThreadPriority = (PAvSetMmThreadPriority)GetProcAddress(avrt_module_, “AvSetMmThreadPriority”);


In the actual data processing thread Association:

hMmTask_ = _PAvSetMmThreadCharacteristicsA(“Pro Audio”, &taskIndex);

if (hMmTask_)


_PAvSetMmThreadPriority(hMmTask_, AVRT_PRIORITY_CRITICAL);


Through task binding, it can effectively improve the reliability of audio data processing thread.

2. Worker thread

The related initialization and release operations of the device need to be processed in a unified thread. Some system COM objects need to be released in the creation thread when they are released, otherwise the release may crash. While some volume selection, monitoring and other processing can be processed in the user thread, but need to do a lot of thread safety.

3. Equipment format selection

If you need to use a custom format when selecting the device’s sampling rate, channel and other formats, you may encounter the scenario that the format matching fails or the device initialization fails after selecting the matched format. Usually, the default format is used to start directly in such scenarios.

4. Data processing exception

When data processing thread processes audio data, event response time-out and device object exception usually occur. The usual processing method is to exit the data thread and end the device, then check whether the current device works normally, and then restart the current device or select the default device.

understandNetease cloud audio and video call>>>

understandNetease Yunxin, communications and video cloud services from Netease’s Core Architecture > >

More technical dry cargo, welcome to VX official account.“Netease smart enterprise technology +”。 If you watch the series of courses in advance, you can get excellent gifts free of charge. You can also talk to CTO directly.

Listen to Netease CTO about frontier observation, see the most valuable technology dry goods, learn the latest practical experience of Netease. Netease smart enterprise technology +, accompany you from thinker to technical expert.