Baidu brain far field voice development kit evaluation – quick start, excellent voice interaction experience


In recent years, the topic of artificial intelligence is becoming more and more hot, which has attracted more and more attention. Baidu started the development of artificial intelligence technology in 2010, and it has been more than 8 years now. At present, baidu AI technology patents are among the best in China and even in the world.

I began to contact Baidu Ai Community at the end of 18 years. Through the use of Baidu AI technology such as character recognition and image recognition, I gradually felt the power of AI technology, and also felt the progress of Baidu AI technology: AI technology has a wider range, recognition speed is faster and faster, and recognition accuracy is higher and higher. This time, I am honored to receive the evaluation invitation of Baidu far field voice development kit. As a non test professional, here I will share with you the use process and problems encountered in the use process of my own Baidu far field voice development kit. If there are any mistakes, please give me more instructions.

First, open the box.

The whole package is very simple. It’s a white rectangular box with the logo of “Baidu brain” printed on the top.

After opening the box, what first catches the eye is a “Baidu voice far field development kit specification”, which introduces information such as hardware purchase and development data.

Take out the manual and see a rectangular box with the power adapter and USB cable inside.

After taking out the box, I saw the main body of Baidu far-field voice development board, which was well protected by white foam.

Next, let’s take a look at some details of the far field development board:

Finally, take a look at Baidu Yuanchang voice development kit family portrait:

Baidu brain far field voice development kit, based on the rk3308 development platform, provides microphone selection suitable for smart speakers, smart home appliances, vehicle equipment scenarios. The whole development kit includes microphone array board, development board, speakers and cavities that meet the acoustic requirements, and supports signal processing algorithms such as sound source location, noise elimination, etc Effective pick-up in meter, support far-field wake-up, far-field recognition, speech synthesis ability, make speech development and evaluation more simple and efficient.

The rk3308 development platform, adopting the 64 bit 4-core armcortex-a35 processor rk3308 series, integrates the high-performance codec (8-channel ADC + 2-channel DAC), directly supports the maximum 8-channel digital mic array + mining, realizes high-precision sound collection and analysis, and is an AI + IOT development platform integrating multiple functions for audio applications. Rich operating system / service support, facilitate the rapid development of aiot and product application.

For more information about Baidu far field voice development kit, please refer to this link:

2、 Development testing

The test environment is Ubuntu 16.04 64ibt virtual machine, and the development platform is rk3308.

After logging in the development board, you can enter the / OEM directory, which contains development related instructions and some test examples.

(1) Connecting devices

1. In the ADB environment: sudo apt install ADB

[email protected]:~$ sudo apt install adb
Reading package list... Done
Analyzing dependency tree for package       
Reading status information... Done       
The following packages are installed automatically and are no longer needed:
  snapd-login-service xdg-desktop-portal xdg-desktop-portal-gtk
Use 'sudo apt autoremove' to uninstall it (them).
The following software will be installed at the same time:
  android-libadb android-libbase android-libcutils android-liblog
The following [new] packages will be installed:
  adb android-libadb android-libbase android-libcutils android-liblog
0 packages have been upgraded and 5 new packages have been installed. To uninstall 0 packages, 6 packages have not been upgraded.
An archive of 141 KB needs to be downloaded.
Decompression consumes 428 KB of extra space.
Do you want to continue? [Y/n] y
Get: 1 xenial/universe AMD64 Android liblog AMD64 1:6.0.1 + r16-3 [16.6 KB]
Access: 2 xenial/universe AMD64 Android libbase AMD64 1:6.0.1 + r16-3 [9014 b]
Get: 3 xenial/universe AMD64 android-libraries AMD64 1:6.0.1 + r16-3 [18.7 KB]
Access: 4 xenial/universe AMD64 Android libadb AMD64 1:6.0.1 + r16-3 [53.2 KB]
Access: 5 xenial/universe AMD64 ADB AMD64 1:6.0.1 + r16-3 [44.0 KB]
141 KB downloaded, 2 seconds (48.3 KB / s)
Selecting unselected package Android liblog.
(Reading database... The system currently has 215288 files and directories installed. )
Preparing to unpack... / Android liblog < 1% 3a6.0.1 + r16-3 < AMD64. DEB
Unpacking Android liblog (1:6.0.1 + r16-3)
Selecting unselected package Android libbase.
Preparing to unpack... / Android libbase \ U 1% 3a6.0.1 + r16-3 \ amd64.deb
Unpacking Android libbase (1:6.0.1 + r16-3)
Selecting unselected package Android libcutils.
Preparing to unpack... / Android libcutils? 1% 3a6.0.1 + r16-3? AMD64. DEB
Unpacking Android libcutils (1:6.0.1 + r16-3)
Selecting unselected package Android libadb.
Preparing to unpack... / Android libadb ABCD 1% 3a6.0.1 + r16-3 ABCD amd64.deb
Unpacking Android libadb (1:6.0.1 + r16-3)
Selecting unselected package ADB.
Preparing to unpack... / ADB ABCD 1% 3a6.0.1 + r16-3 ABCD amd64.deb
Unpacking ADB (1:6.0.1 + r16-3)
Processing trigger for libc bin (2.23-0ubuntu 11)
Processing trigger for man dB (2.7.5-1)
Setting Android liblog (1:6.0.1 + r16-3)
Setting up Android libbase (1:6.0.1 + r16-3)
Setting Android libcutils (1:6.0.1 + r16-3)
Setting Android libadb (1:6.0.1 + r16-3)
Setting ADB (1:6.0.1 + r16-3)
Processing trigger for libc bin (2.23-0ubuntu 11)

2. View the installation result: ADB version

[email protected]:~$ adb version
Android Debug Bridge version 1.0.32
Revision debian

3. Check whether the hardware is connected: ADB devices

[email protected]:~$ adb devices
List of devices attached
e9901a0bf326eb31    device

4. Connecting hardware: ADB shell

[email protected]:~$ adb shell
/ # ls
bin       lib       mnt       root      sys       usr
data      lib32     oem       run       tmp       var
dev       linuxrc   opt       sbin      udisk
etc       media     proc      sdcard    userdata

(2) WiFi connection

1. Enter / data / CFG for WiFi configuration: CD / data / CFG

Use VI to edit WPA ABCD supplicant.conf: VI WPA ABCD supplicant.conf

After setting WPA ﹣ supplicant.conf, you can reconnect to the network through the following command:

wpa_cli reconfigure
wpa_cli reconnect

Note: for the first operation, there is an error in calling the WPA ABCD cli reconfigure command:

/userdata/cfg # wpa_cli reconfigure
Failed to connect to non-global ctrl_ifname: (nil)  error: No such file or directory

Switch networking command: WPA? Supplicant – B – I WLAN 0 – C

/userdata/cfg # wpa_supplicant -B -i wlan0 -c 
Successfully initialized wpa_supplicant

/userdata/cfg # wpa_cli reconfigure
Selected interface 'wlan0'

/userdata/cfg # wpa_cli reconnect
Selected interface 'wlan0'

Although the operation is successful, if you look at the network connection, it is still not successful (you can see that WLAN 0 does not display the IP address):

/userdata/cfg # ifconfig
lo        Link encap:Local Loopback  
          inet addr:  Mask:
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

p2p0      Link encap:Ethernet  HWaddr C6:60:34:AC:2C:AA  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wlan0     Link encap:Ethernet  HWaddr C4:60:34:AC:2C:AA  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:1 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1 errors:0 dropped:12 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:8555 (8.3 KiB)  TX bytes:7900 (7.7 KiB)

Restart the system (if it can’t be restarted once, it can be restarted many times). Finally, it succeeds (you can see that WLAN 0 shows the assigned IP address –

/userdata/cfg # reboot
/ # ifconfig
lo        Link encap:Local Loopback  
          inet addr:  Mask:
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

p2p0      Link encap:Ethernet  HWaddr C6:60:34:AC:2C:AA  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wlan0     Link encap:Ethernet  HWaddr C4:60:34:AC:2C:AA  
          inet addr:  Bcast:  Mask:
          RX packets:11 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2234 (2.1 KiB)  TX bytes:1481 (1.4 KiB)


(3) Run speech recognition example

Enter the / OEM directory to view the voice ability related files: CD OEM

/oem # ls
BDSpeechSDK             lost+found
alsa_audio_main_service     readme.txt
config_open_platfrom_rk3308_4_2.lst          version


View Document Description: cat readme.txt

Integration and use instructions:
1. Push library to device
   adb push lib/ /data
   adb push lib/ /data
   adb push lib/ /data
   adb push conf/config_open_platfrom_rk3308_4_2.lst /data
   adb push /data
   adb push bin/alsa_audio_main_service /data
   adb push bin/alsa_audio_client_sample /data
   adb shell sync
2. Create directory and modify permissions
   adb shell;cd /data
   chmod +x alsa_audio_*
   chmod +x
3. Run main service
   ./alsa_audio_main_service multi_4_2 &
   HW: 0,0 is the sound card number and device number of the corresponding recording device. You can also configure asound.conf to use the logical PCM device name
4. To run the app, such as duer Linux, you need to add the / data directory to the dynamic library link path of duer Linux
   You can also run our sample program
   In the current directory, the recording file dump ﹣ pcm.pcm, which is a dual channel, 16K, small end, 16bit deep audio, will be saved after signal processing.

How to save the original recording data:
    Run before starting recording:
    mkdir -p /data/local/aw.so_profile
    touch  /data/local/aw.so_profile/dump_switch
    touch  /data/local/aw.so_profile/dump_switch_wakets
    mkdir -p /data/local/aud_rec/
    chmod 777 /data/local/aud_rec/
    Look at the directory of the configuration file
AUDCAP_DBG_SWICH        "/tmp/aw.so_profile/"
AUDCAP_DBG_FLDER        "/tmp/aud_rec/"
AUDCAP_DBG_SAVED        "/tmp/aud_rec/last/"
In the directory of / data / local / aud_rec, four channels of microphone data and two channels of reference data will be saved, one channel of identification data and one channel of wake-up data.
The data format of the file is: 16KHz, small end, 16bit, mono


According to the document, the relevant files should be in the data directory, but the actual operation found that the relevant files are in the OEM directory, so enter the OEM directory and perform the operation:

It is mainly to run the following four commands (this command is to modify permissions and start the alsa audio main service service. According to the instructions, you need to start the alsa service before using the voice recognition function):

chmod +x alsa_audio_*
chmod +x	
./alsa_audio_main_service multi_4_2 &

After executing the above command, you can use the PS command to check whether the alsa service is started correctly: PS – a| grep alsa


According to the specification, the sample directory in bdspeech SDK contains speech recognition examples. Because speech recognition runtime depends on Libraries and resource files in lib, resources and extern directories, we need to share the libraries at startup.

The shared library can be found here:

Enter the directory and run the speech recognition example:

cd /oem/BDSpeechSDK/sample/wakeup
LD_LIBRARY_PATH=/oem ./e2e_wp_asr_test


The recognition result of “Xiaodu, how is the weather today”:

It can be found that this development kit adopts flow recognition with intermediate results!

(4) Speech recognition effect test

Through the way of distance, speech speed, and whether there is an insulator (the insulator is a computer table, display screen) in the middle, a rough test is carried out for the development kit

One meter:

Isolation: “one meter wake-up test”

Isolation, fast speech speed: “one meter secondary wake-up test”

Isolation, fast voice: “what’s the weather like today”

No isolation, normal speech speed: “one meter three times wake-up test”

No isolation, fast voice: “what’s the weather like today”

2m: (no isolation, normal speech speed)

“Two meter wake-up test“

“Two meter secondary wake up test”

“It’s cooler”

3M: (no isolation, normal speech speed)

“Three meter wake-up test”

“Three meter secondary wake-up test”

“I heard there will be a typhoon tomorrow”

5m: (no isolation, normal speech speed)

“Wake up every five meters”


“Five meter second wake up”

“What to do if the typhoon comes”

6m: (no isolation, normal speech speed)

Note: first of all, it needs to wake up a little louder. After waking up, you can recognize the sound of normal volume:

“Wake up once every six meters”

“Six meter second wake up”

“It’s sunny today”

Test results:

After the above speech recognition test, it can be found that this kit can achieve a better wake-up and recognition effect within 5 meters, and after more than 5 meters, the wake-up and recognition effect will decline significantly.

In addition, for general terms, the recognition is quite accurate (even if it reaches 6 meters, it can also be more accurate), but for words with similar pronunciation (“one meter” – “corn”, “two meters” – “Yang Mi”, etc.), the recognition accuracy is a bit poor (it may also be related to the pronunciation).

Whether there is isolation between the sound source and the kit (incomplete isolation) has little impact on recognition (no significant impact on pronunciation).

When the speech speed is not very fast, it can be recognized normally.

Generally speaking, apart from distance, the words with similar pronunciation (pronunciation) have a great influence on recognition results.

(5) Bluetooth connection

Enter the command: BT? Realtek? Start to start Bluetooth:

Turn on the computer’s Bluetooth, you can find a Bluetooth device named Realtek? BT, and try to pair it:

Pairing successful:

After the pairing is successful, you can use Bluetooth to play music and other operations.
Disconnect Bluetooth:

Problems found in the test:

1. When Bluetooth is turned on for the first time to play the audio, the sound is too loud. After adjusting the sound, it is found that the maximum sound volume is too low.

2. Although the Bluetooth of this kit can connect multiple Bluetooth devices at one time (I tried to connect two Bluetooth devices), after two successful connections, when playing music with one Bluetooth device, and then stop, and then play music with another Bluetooth device, it is found that playing fails, and only the previous Bluetooth device can play music.

(6) Recording and playing audio test

View TMP directory content: CD TMP

Recording: arecord-d HW: 2,0-c 8-r 16000-f S16 ˊ Le / TMP / test.wav

After recording, view the catalog file information:

Use the aplay command to play the recording file: aplay test.wav

The recording is good.

(7) Speech synthesis test

The example program will send the text “456hello, good weather today” to the server, which will generate the corresponding voice and save it as a PCM file, so that the user can play the experience.
Enter the speech synthesis example Directory: CD / OEM / bdspeechsdk / sample / TTS

Perform speech synthesis operation: LD? Library? Path = / OEM. / online test:

After running, a xxx.pcm will be generated in the current directory, where XXX is a time stamp during the test. Execute the following command at the terminal to experience the voice synthesis effect: aplay – t raw – C 1 – F s16_le – R 16000 xxx.pcm

(8) Cross compilation

1. Download SDK

Copy the bdspeech SDK directory in OEM in the rk3308 board to the virtual machine. You can download the ADB pull / OEM / bdspeech SDK instructions, and then put the SDK into the virtual machine.

Here I download the entire / OEM directory directly to the “download” directory.

2. Cross tool chain:


Extraction code: we2t

The compilation of rk3308 is not possible on standard Linux. We need to use cross compilation tool chain here. This is a special compiler, which can be considered as a toolkit that can run only when platform a compiles platform B.

I have copied the cross compilation chain to the desktop of the virtual machine.

3. Construction of project directory structure as required

mkdir myProject
cd myProject
touch Makefile
mkdir src
touch src/main.cpp

Create the following directory structure:

├── Makefile
└── src
     └── main.cpp

Write (copy) sample code

There is corresponding demo code in E2E ﹣ WP ﹣ ASR ﹣ test.cpp in the directory sample / wakeup / SRC. Here, I will directly copy the sample code in wakeup / SRC to replace main.cpp

If you don’t make any changes here, just copy. At present, the first goal is to cross compile correctly as soon as possible and run on the board successfully.

Write (copy) makefile code

Makefile can help the project quickly connect and compile, which can save a lot of effort. Since I am not a pure C + + Development Engineer, I copy the makefile code in sample / wakeup /

#make src=src/***.cpp
TARGET=$(basename $(notdir $(FILE_NAME)))

INCLUDE=-I../../include -I../../include/ASR -I../../include/TTS -I../../extern/include -I../../extern/include/longconnect
CPPFLAGS=-Wall -fopenmp -O2 -fPIC -g -D__LINUX__ -Wl,-rpath=../../lib,--disable-new-dtags,--copy-dt-needed-entries -Wl,-rpath=../../extern/lib,--disable-new-dtags -L../../lib -lBDSpeechSDK -L../../extern/lib -lzlog -llongconnect -lnghttp2 -lcurl -lssl -lcrypto -lz -lAudioEncoder -liconv -lAudioDecoder -lhttpDNS -lbd_alsa_audio_client -lgomp -lrt -ldl -lpthread
ifneq ($(strip $(SYS_ROOT)),)

SRC_FILE=$(shell cd $(SRC_PATH)&&echo *.cpp)
SRC=$(foreach n,$(SRC_FILE),$(SRC_PATH)/$(n))


LIST_ALL_FILES=$(shell find . -maxdepth 1)
SOURCES=. ./Makefile ./src

    -rm -rf $(RM_FILES)

Try compiling:

After configuration, you need to execute in the directory where makefile is located:

Export path = / home / snow / Desktop 1 / rk3308 arm tool chain / bin: $path
Make file \ name = Src / main.cpp sys \ root = / home / snow / Desktop 1 / rk3308 \ arm \ tool \ chain / arm-rockchip-linux-gnueabihf/sysroot

In the above statement, / home / snow / Desktop 1 / rk3308 ﹣ arm ﹣ tool ﹣ chain represents the root directory / bin of tool chain rk3308 ﹣ arm ﹣ tool ﹣ chain. If the path is filled in incorrectly, the following error will appear:


Make: arm rockchip Linux gnueabihf-g + +: command not found

You can enter the rk3308 arm tool chain directory and use the PWD command to obtain the path of the rk3308 arm tool chain:


Using the correct path, recompile:

Export path = / home / snow / desktop / rk3308 arm tool chain / bin: $path
Make file ﹣ name = Src / main.cpp sys ﹣ root = / home / snow / desktop / rk3308 ﹣ arm ﹣ tool ﹣ chain / arm rockchip Linux gnueabihf / sysRoot


If the above error occurs, it means that the so Library of alsa is missing. We can pull a file from the OEM directory. He is in the / OEM directory, named Copy it to bdspeech SDK / lib. This directory is dedicated to external dependent library files. Let’s also put this one here.

Then try to compile again, without any error prompt, and the compilation passes.

After compiling successfully, we found that there is an additional main file in the project directory, which is the compiled executable program.

Copy the main executable program to / tmp directory (TMP directory will be cleared after power failure): ADB push. / main / tmp

Run program:

LD_LIBRARY_PATH=/oem:/oem/BDSpeechSDK/lib:/oem/BDSpeechSDK/extern/lib ./main

An error occurred:

error:5, domain:38, desc:Wakeup: dat file invalid., sn:

This means that the dat file was not loaded successfully.

Let’s take a look at the code. In the wakeup ﹣ config function, you can see the path of its configuration dat file, which is.. / resources / ASR ﹣ resource / esis ﹣ resource.pkg
Just change this level to absolute path, or change the path to short. / esis_resource.pkg, and copy the PKG file.


Then recompile, push the ADB to TMP,

It’s still this error, because although the file path has been changed, we haven’t pushed esis_resource.pkg to the tmp folder. Go to the / home / snow / download / OEM / bdspeechsdk / resources / asr_resource folder and execute ADB push. / esis_resource.pkg/tmp:

Execute again with error: – 1, domain: 10, desc: alsa_audio_client_open failed, SN:,

Because our main also depends on alsa services, we need to start alsa services:

cd /oem
chmod +x alsa_audio_*
chmod +x
./alsa_audio_main_service multi_4_2 &


Or you can write the above statement to the file / OEM /, which is an execution file that will run after the board is powered on. We can write all the things that need to be started when the board is powered on in this file, so that the board will automatically start alsa for us next time.

After successful execution, run the main program again:

cd /tmp
LD_LIBRARY_PATH=/oem:/oem/BDSpeechSDK/lib:/oem/BDSpeechSDK/extern/lib ./main

Execution succeeded.

(9) Cross compilation speech synthesis

According to the above method, we can compile a speech synthesis example (you can change the synthesized text to what you like, and here I change it to “Hello world, today’s Chinese Valentine’s day, the traditional Chinese festival!” , conditional, you can try to input text, and then synthesize voice).


The above warning can be ignored.
Download the compiled executable to the development board and run:

In case of an error, check the source code carefully. It is found that the main function needs to refer to the configuration file Speech SDK log.conf, but the path is.. / resources / Speech SDK log.conf:

And push the speech ﹣ SDK ﹣ log.conf file to the / tmp path:


LD_LIBRARY_PATH=/oem:/oem/BDSpeechSDK/lib:/oem/BDSpeechSDK/extern/lib ./main


After successful execution, you can see that there is an additional 6832.pcm file in the / tmp directory. Run the command aplay-t raw-c 1-f s16_le-r 16000 6832.pcm. After execution, you can play the voice content of the girl version.


3、 Product suggestions

After a week of testing, it is found that Baidu far field voice development kit is excellent in voice wake-up and speech recognition, and the recognition rate is high on the whole. If more training is done in terms of words with similar pronunciation, the effect will be better. In addition, some suggestions are put forward on the function of the product according to personal use experience:

1. Improve sound quality and voice diversity

In the future, more speakers with different styles will be provided, and users can choose independently. There will be more voice colors for selection and adaptation in different scenes such as shopping, information query, audio resource playback, etc. Let “Xiaodu” be humanized and personalized gradually.

2. Perfect voice interaction function

Provide higher quality voice interaction, enhance dialogue understanding and dialogue management technology, and just build capacity. Through continuous practice, the speaker can “understand” the user’s meaning, provide more accurate results for the user, and easily customize professional, controllable and stable complete voice interaction ability.

3. Timbre recognition

In the speech interaction, the voice color of different users is recognized by voice print, and the interaction content is judged and understood according to the voice color. This function can be used in voice print unlocking and speech interaction understanding, including the priority of command execution when multiple people speak.

There is a large space for voice interaction in the future, but the voice interaction function is not so extensive now. However, we believe that as long as we adhere to the development, maintain data collection, scene optimization, there will be more in-depth development in various fields in the future.

Author: let Tianya

Recommended Today

Blog based on beego, go blog

Go Blog A beego based development, can quickly create personal blog, CMS system Include functions see Official website of go Demo Update log time function January 23, 2020 New top post function February 2, 2020 New custom navigation function February 4, 2020 New site announcement function February 6, 2020 New link module February […]