In recent years, the topic of artificial intelligence is becoming more and more hot, which has attracted more and more attention. Baidu started the development of artificial intelligence technology in 2010, and it has been more than 8 years now. At present, baidu AI technology patents are among the best in China and even in the world.
I began to contact Baidu Ai Community at the end of 18 years. Through the use of Baidu AI technology such as character recognition and image recognition, I gradually felt the power of AI technology, and also felt the progress of Baidu AI technology: AI technology has a wider range, recognition speed is faster and faster, and recognition accuracy is higher and higher. This time, I am honored to receive the evaluation invitation of Baidu far field voice development kit. As a non test professional, here I will share with you the use process and problems encountered in the use process of my own Baidu far field voice development kit. If there are any mistakes, please give me more instructions.
First, open the box.
The whole package is very simple. It’s a white rectangular box with the logo of “Baidu brain” printed on the top.
After opening the box, what first catches the eye is a “Baidu voice far field development kit specification”, which introduces information such as hardware purchase and development data.
Take out the manual and see a rectangular box with the power adapter and USB cable inside.
After taking out the box, I saw the main body of Baidu far-field voice development board, which was well protected by white foam.
Next, let’s take a look at some details of the far field development board:
Finally, take a look at Baidu Yuanchang voice development kit family portrait:
Baidu brain far field voice development kit, based on the rk3308 development platform, provides microphone selection suitable for smart speakers, smart home appliances, vehicle equipment scenarios. The whole development kit includes microphone array board, development board, speakers and cavities that meet the acoustic requirements, and supports signal processing algorithms such as sound source location, noise elimination, etc Effective pick-up in meter, support far-field wake-up, far-field recognition, speech synthesis ability, make speech development and evaluation more simple and efficient.
The rk3308 development platform, adopting the 64 bit 4-core armcortex-a35 processor rk3308 series, integrates the high-performance codec (8-channel ADC + 2-channel DAC), directly supports the maximum 8-channel digital mic array + mining, realizes high-precision sound collection and analysis, and is an AI + IOT development platform integrating multiple functions for audio applications. Rich operating system / service support, facilitate the rapid development of aiot and product application.
For more information about Baidu far field voice development kit, please refer to this link: https://aim.baidu.com/product/b226a947-4660-4e27-83b4-877bf63b8627
2、 Development testing
The test environment is Ubuntu 16.04 64ibt virtual machine, and the development platform is rk3308.
After logging in the development board, you can enter the / OEM directory, which contains development related instructions and some test examples.
（1） Connecting devices
1. In the ADB environment: sudo apt install ADB
[email protected]:~$ sudo apt install adb Reading package list... Done Analyzing dependency tree for package Reading status information... Done The following packages are installed automatically and are no longer needed: snapd-login-service xdg-desktop-portal xdg-desktop-portal-gtk Use 'sudo apt autoremove' to uninstall it (them). The following software will be installed at the same time: android-libadb android-libbase android-libcutils android-liblog The following [new] packages will be installed: adb android-libadb android-libbase android-libcutils android-liblog 0 packages have been upgraded and 5 new packages have been installed. To uninstall 0 packages, 6 packages have not been upgraded. An archive of 141 KB needs to be downloaded. Decompression consumes 428 KB of extra space. Do you want to continue? [Y/n] y Get: 1 http://cn.archive.ubuntu.com/ubuntu xenial/universe AMD64 Android liblog AMD64 1:6.0.1 + r16-3 [16.6 KB] Access: 2 http://cn.archive.ubuntu.com/ubuntu xenial/universe AMD64 Android libbase AMD64 1:6.0.1 + r16-3 [9014 b] Get: 3 http://cn.archive.ubuntu.com/ubuntu xenial/universe AMD64 android-libraries AMD64 1:6.0.1 + r16-3 [18.7 KB] Access: 4 http://cn.archive.ubuntu.com/ubuntu xenial/universe AMD64 Android libadb AMD64 1:6.0.1 + r16-3 [53.2 KB] Access: 5 http://cn.archive.ubuntu.com/ubuntu xenial/universe AMD64 ADB AMD64 1:6.0.1 + r16-3 [44.0 KB] 141 KB downloaded, 2 seconds (48.3 KB / s) Selecting unselected package Android liblog. (Reading database... The system currently has 215288 files and directories installed. ) Preparing to unpack... / Android liblog < 1% 3a6.0.1 + r16-3 < AMD64. DEB Unpacking Android liblog (1:6.0.1 + r16-3) Selecting unselected package Android libbase. Preparing to unpack... / Android libbase \ U 1% 3a6.0.1 + r16-3 \ amd64.deb Unpacking Android libbase (1:6.0.1 + r16-3) Selecting unselected package Android libcutils. Preparing to unpack... / Android libcutils? 1% 3a6.0.1 + r16-3? AMD64. DEB Unpacking Android libcutils (1:6.0.1 + r16-3) Selecting unselected package Android libadb. Preparing to unpack... / Android libadb ABCD 1% 3a6.0.1 + r16-3 ABCD amd64.deb Unpacking Android libadb (1:6.0.1 + r16-3) Selecting unselected package ADB. Preparing to unpack... / ADB ABCD 1% 3a6.0.1 + r16-3 ABCD amd64.deb Unpacking ADB (1:6.0.1 + r16-3) Processing trigger for libc bin (2.23-0ubuntu 11) Processing trigger for man dB (2.7.5-1) Setting Android liblog (1:6.0.1 + r16-3) Setting up Android libbase (1:6.0.1 + r16-3) Setting Android libcutils (1:6.0.1 + r16-3) Setting Android libadb (1:6.0.1 + r16-3) Setting ADB (1:6.0.1 + r16-3) Processing trigger for libc bin (2.23-0ubuntu 11)
2. View the installation result: ADB version
[email protected]:~$ adb version Android Debug Bridge version 1.0.32 Revision debian
3. Check whether the hardware is connected: ADB devices
[email protected]:~$ adb devices List of devices attached e9901a0bf326eb31 device
4. Connecting hardware: ADB shell
[email protected]:~$ adb shell / # ls bin lib mnt root sys usr data lib32 oem run tmp var dev linuxrc opt sbin udisk etc media proc sdcard userdata
（2） WiFi connection
1. Enter / data / CFG for WiFi configuration: CD / data / CFG
Use VI to edit WPA ABCD supplicant.conf: VI WPA ABCD supplicant.conf
After setting WPA ﹣ supplicant.conf, you can reconnect to the network through the following command:
wpa_cli reconfigure wpa_cli reconnect
Note: for the first operation, there is an error in calling the WPA ABCD cli reconfigure command:
/userdata/cfg # wpa_cli reconfigure Failed to connect to non-global ctrl_ifname: (nil) error: No such file or directory
Switch networking command: WPA? Supplicant – B – I WLAN 0 – C
/userdata/cfg # wpa_supplicant -B -i wlan0 -c /data/cfg/wpa_supplicant.conf Successfully initialized wpa_supplicant /userdata/cfg # wpa_cli reconfigure Selected interface 'wlan0' OK /userdata/cfg # wpa_cli reconnect Selected interface 'wlan0' OK
Although the operation is successful, if you look at the network connection, it is still not successful (you can see that WLAN 0 does not display the IP address):
/userdata/cfg # ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) p2p0 Link encap:Ethernet HWaddr C6:60:34:AC:2C:AA UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wlan0 Link encap:Ethernet HWaddr C4:60:34:AC:2C:AA UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:1 errors:0 dropped:0 overruns:0 frame:0 TX packets:1 errors:0 dropped:12 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:8555 (8.3 KiB) TX bytes:7900 (7.7 KiB)
Restart the system (if it can’t be restarted once, it can be restarted many times). Finally, it succeeds (you can see that WLAN 0 shows the assigned IP address – 192.168.1.110):
/userdata/cfg # reboot / # ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) p2p0 Link encap:Ethernet HWaddr C6:60:34:AC:2C:AA UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wlan0 Link encap:Ethernet HWaddr C4:60:34:AC:2C:AA inet addr:192.168.1.110 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11 errors:0 dropped:0 overruns:0 frame:0 TX packets:9 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2234 (2.1 KiB) TX bytes:1481 (1.4 KiB)
（3） Run speech recognition example
Enter the / OEM directory to view the voice ability related files: CD OEM
/oem # ls 1K.wav libbd_alsa_audio_client.so BDSpeechSDK libbd_audio_vdev.so RkLunch.sh lost+found alsa_audio_main_service readme.txt config_open_platfrom_rk3308_4_2.lst setup.sh environment.md version libbdSPILAudioProc.so
View Document Description: cat readme.txt
libbdSPILAudioProc.so md5:29669122675b50bb21f738014dc04fe5 libbd_audio_vdev.so md5:8184b0a37c4037cc2264fee6518ed8a8 libbd_alsa_audio_client.so md5:ec46e6c27734a1c684b1ab8fab762fe6 Integration and use instructions: 1. Push library to device adb push lib/libbdSPILAudioProc.so /data adb push lib/libbd_audio_vdev.so /data adb push lib/libbd_alsa_audio_client.so /data adb push conf/config_open_platfrom_rk3308_4_2.lst /data adb push setup.sh /data adb push bin/alsa_audio_main_service /data adb push bin/alsa_audio_client_sample /data adb shell sync 2. Create directory and modify permissions adb shell;cd /data chmod +x alsa_audio_* chmod +x setup.sh 3. Run main service ./setup.sh ./alsa_audio_main_service multi_4_2 & HW: 0,0 is the sound card number and device number of the corresponding recording device. You can also configure asound.conf to use the logical PCM device name 4. To run the app, such as duer Linux, you need to add the / data directory to the dynamic library link path of duer Linux You can also run our sample program ./alsa_audio_client_sample In the current directory, the recording file dump ﹣ pcm.pcm, which is a dual channel, 16K, small end, 16bit deep audio, will be saved after signal processing. How to save the original recording data: Run before starting recording: mkdir -p /data/local/aw.so_profile touch /data/local/aw.so_profile/dump_switch touch /data/local/aw.so_profile/dump_switch_wakets mkdir -p /data/local/aud_rec/ chmod 777 /data/local/aud_rec/ Look at the directory of the configuration file AUDCAP_DBG_SWICH "/tmp/aw.so_profile/" AUDCAP_DBG_FLDER "/tmp/aud_rec/" AUDCAP_DBG_SAVED "/tmp/aud_rec/last/" In the directory of / data / local / aud_rec, four channels of microphone data and two channels of reference data will be saved, one channel of identification data and one channel of wake-up data. The data format of the file is: 16KHz, small end, 16bit, mono
According to the document, the relevant files should be in the data directory, but the actual operation found that the relevant files are in the OEM directory, so enter the OEM directory and perform the operation:
It is mainly to run the following four commands (this command is to modify permissions and start the alsa audio main service service. According to the instructions, you need to start the alsa service before using the voice recognition function):
chmod +x alsa_audio_* chmod +x setup.sh ./setup.sh ./alsa_audio_main_service multi_4_2 &
After executing the above command, you can use the PS command to check whether the alsa service is started correctly: PS – a| grep alsa
According to the specification, the sample directory in bdspeech SDK contains speech recognition examples. Because speech recognition runtime depends on Libraries and resource files in lib, resources and extern directories, we need to share the libraries at startup.
The shared library can be found here: https://www.cnblogs.com/mylinux/p/4955448.html
Enter the directory and run the speech recognition example:
cd /oem/BDSpeechSDK/sample/wakeup LD_LIBRARY_PATH=/oem ./e2e_wp_asr_test
The recognition result of “Xiaodu, how is the weather today”:
It can be found that this development kit adopts flow recognition with intermediate results!
（4） Speech recognition effect test
Through the way of distance, speech speed, and whether there is an insulator (the insulator is a computer table, display screen) in the middle, a rough test is carried out for the development kit
Isolation: “one meter wake-up test”
Isolation, fast speech speed: “one meter secondary wake-up test”
Isolation, fast voice: “what’s the weather like today”
No isolation, normal speech speed: “one meter three times wake-up test”
No isolation, fast voice: “what’s the weather like today”
2m: (no isolation, normal speech speed)
“Two meter wake-up test“
“Two meter secondary wake up test”
3M: (no isolation, normal speech speed)
“Three meter wake-up test”
“Three meter secondary wake-up test”
“I heard there will be a typhoon tomorrow”
5m: (no isolation, normal speech speed)
“Wake up every five meters”
“Five meter second wake up”
“What to do if the typhoon comes”
6m: (no isolation, normal speech speed)
Note: first of all, it needs to wake up a little louder. After waking up, you can recognize the sound of normal volume:
“Wake up once every six meters”
“Six meter second wake up”
“It’s sunny today”
After the above speech recognition test, it can be found that this kit can achieve a better wake-up and recognition effect within 5 meters, and after more than 5 meters, the wake-up and recognition effect will decline significantly.
In addition, for general terms, the recognition is quite accurate (even if it reaches 6 meters, it can also be more accurate), but for words with similar pronunciation (“one meter” – “corn”, “two meters” – “Yang Mi”, etc.), the recognition accuracy is a bit poor (it may also be related to the pronunciation).
Whether there is isolation between the sound source and the kit (incomplete isolation) has little impact on recognition (no significant impact on pronunciation).
When the speech speed is not very fast, it can be recognized normally.
Generally speaking, apart from distance, the words with similar pronunciation (pronunciation) have a great influence on recognition results.
（5） Bluetooth connection
Enter the command: BT? Realtek? Start to start Bluetooth:
Turn on the computer’s Bluetooth, you can find a Bluetooth device named Realtek? BT, and try to pair it:
After the pairing is successful, you can use Bluetooth to play music and other operations.
Problems found in the test:
1. When Bluetooth is turned on for the first time to play the audio, the sound is too loud. After adjusting the sound, it is found that the maximum sound volume is too low.
2. Although the Bluetooth of this kit can connect multiple Bluetooth devices at one time (I tried to connect two Bluetooth devices), after two successful connections, when playing music with one Bluetooth device, and then stop, and then play music with another Bluetooth device, it is found that playing fails, and only the previous Bluetooth device can play music.
（6） Recording and playing audio test
View TMP directory content: CD TMP
Recording: arecord-d HW: 2,0-c 8-r 16000-f S16 ˊ Le / TMP / test.wav
After recording, view the catalog file information:
Use the aplay command to play the recording file: aplay test.wav
The recording is good.
（7） Speech synthesis test
The example program will send the text “456hello, good weather today” to the server, which will generate the corresponding voice and save it as a PCM file, so that the user can play the experience.
Enter the speech synthesis example Directory: CD / OEM / bdspeechsdk / sample / TTS
Perform speech synthesis operation: LD? Library? Path = / OEM. / online test:
After running, a xxx.pcm will be generated in the current directory, where XXX is a time stamp during the test. Execute the following command at the terminal to experience the voice synthesis effect: aplay – t raw – C 1 – F s16_le – R 16000 xxx.pcm
（8） Cross compilation
1. Download SDK
Copy the bdspeech SDK directory in OEM in the rk3308 board to the virtual machine. You can download the ADB pull / OEM / bdspeech SDK instructions, and then put the SDK into the virtual machine.
Here I download the entire / OEM directory directly to the “download” directory.
2. Cross tool chain:
Extraction code: we2t
The compilation of rk3308 is not possible on standard Linux. We need to use cross compilation tool chain here. This is a special compiler, which can be considered as a toolkit that can run only when platform a compiles platform B.
I have copied the cross compilation chain to the desktop of the virtual machine.
3. Construction of project directory structure as required
mkdir myProject cd myProject touch Makefile mkdir src touch src/main.cpp
Create the following directory structure:
myProject/ ├── Makefile └── src └── main.cpp
Write (copy) sample code
There is corresponding demo code in E2E ﹣ WP ﹣ ASR ﹣ test.cpp in the directory sample / wakeup / SRC. Here, I will directly copy the sample code in wakeup / SRC to replace main.cpp
If you don’t make any changes here, just copy. At present, the first goal is to cross compile correctly as soon as possible and run on the board successfully.
Write (copy) makefile code
Makefile can help the project quickly connect and compile, which can save a lot of effort. Since I am not a pure C + + Development Engineer, I copy the makefile code in sample / wakeup /
#make src=src/***.cpp FILE_NAME=$(src) SYS_ROOT=$(sr) TARGET=$(basename $(notdir $(FILE_NAME))) #build CXX=arm-rockchip-linux-gnueabihf-g++ INCLUDE=-I../../include -I../../include/ASR -I../../include/TTS -I../../extern/include -I../../extern/include/longconnect CPPFLAGS=-Wall -fopenmp -O2 -fPIC -g -D__LINUX__ -Wl,-rpath=../../lib,--disable-new-dtags,--copy-dt-needed-entries -Wl,-rpath=../../extern/lib,--disable-new-dtags -L../../lib -lBDSpeechSDK -L../../extern/lib -lzlog -llongconnect -lnghttp2 -lcurl -lssl -lcrypto -lz -lAudioEncoder -liconv -lAudioDecoder -lhttpDNS -lbd_alsa_audio_client -lgomp -lrt -ldl -lpthread ifneq ($(strip $(SYS_ROOT)),) MY_SYS_ROOT=--sysroot=$(SYS_ROOT) endif SRC_PATH=./src SRC_FILE=$(shell cd $(SRC_PATH)&&echo *.cpp) SRC=$(foreach n,$(SRC_FILE),$(SRC_PATH)/$(n)) $(TARGET):$(SRC) $(CXX) -o $(TARGET) ./$(FILE_NAME) $(MY_SYS_ROOT) $(INCLUDE) $(CPPFLAGS) #clean LIST_ALL_FILES=$(shell find . -maxdepth 1) SOURCES=. ./Makefile ./src RM_FILES=$(filter-out $(SOURCES),$(LIST_ALL_FILES)) clean: -rm -rf $(RM_FILES)
After configuration, you need to execute in the directory where makefile is located:
Export path = / home / snow / Desktop 1 / rk3308 arm tool chain / bin: $path Make file \ name = Src / main.cpp sys \ root = / home / snow / Desktop 1 / rk3308 \ arm \ tool \ chain / arm-rockchip-linux-gnueabihf/sysroot
In the above statement, / home / snow / Desktop 1 / rk3308 ﹣ arm ﹣ tool ﹣ chain represents the root directory / bin of tool chain rk3308 ﹣ arm ﹣ tool ﹣ chain. If the path is filled in incorrectly, the following error will appear:
Make: arm rockchip Linux gnueabihf-g + +: command not found
You can enter the rk3308 arm tool chain directory and use the PWD command to obtain the path of the rk3308 arm tool chain:
Using the correct path, recompile:
Export path = / home / snow / desktop / rk3308 arm tool chain / bin: $path Make file ﹣ name = Src / main.cpp sys ﹣ root = / home / snow / desktop / rk3308 ﹣ arm ﹣ tool ﹣ chain / arm rockchip Linux gnueabihf / sysRoot
If the above error occurs, it means that the so Library of alsa is missing. We can pull a file from the OEM directory. He is in the / OEM directory, named libbd_alsa_audio_client.so. Copy it to bdspeech SDK / lib. This directory is dedicated to external dependent library files. Let’s also put this one here.
Then try to compile again, without any error prompt, and the compilation passes.
After compiling successfully, we found that there is an additional main file in the project directory, which is the compiled executable program.
Copy the main executable program to / tmp directory (TMP directory will be cleared after power failure): ADB push. / main / tmp
An error occurred:
error:5, domain:38, desc:Wakeup: dat file invalid., sn:
This means that the dat file was not loaded successfully.
Let’s take a look at the code. In the wakeup ﹣ config function, you can see the path of its configuration dat file, which is.. / resources / ASR ﹣ resource / esis ﹣ resource.pkg
Just change this level to absolute path, or change the path to short. / esis_resource.pkg, and copy the PKG file.
Then recompile, push the ADB to TMP,
It’s still this error, because although the file path has been changed, we haven’t pushed esis_resource.pkg to the tmp folder. Go to the / home / snow / download / OEM / bdspeechsdk / resources / asr_resource folder and execute ADB push. / esis_resource.pkg/tmp:
Execute again with error: – 1, domain: 10, desc: alsa_audio_client_open failed, SN:,
Because our main also depends on alsa services, we need to start alsa services:
cd /oem chmod +x alsa_audio_* chmod +x setup.sh ./setup.sh ./alsa_audio_main_service multi_4_2 &
Or you can write the above statement to the file / OEM / rkrun.sh, which is an execution file that will run after the board is powered on. We can write all the things that need to be started when the board is powered on in this file, so that the board will automatically start alsa for us next time.
After successful execution, run the main program again:
（9） Cross compilation speech synthesis
According to the above method, we can compile a speech synthesis example (you can change the synthesized text to what you like, and here I change it to “Hello world, today’s Chinese Valentine’s day, the traditional Chinese festival!” , conditional, you can try to input text, and then synthesize voice).
The above warning can be ignored.
Download the compiled executable to the development board and run:
In case of an error, check the source code carefully. It is found that the main function needs to refer to the configuration file Speech SDK log.conf, but the path is.. / resources / Speech SDK log.conf:
And push the speech ﹣ SDK ﹣ log.conf file to the / tmp path:
After successful execution, you can see that there is an additional 6832.pcm file in the / tmp directory. Run the command aplay-t raw-c 1-f s16_le-r 16000 6832.pcm. After execution, you can play the voice content of the girl version.
3、 Product suggestions
After a week of testing, it is found that Baidu far field voice development kit is excellent in voice wake-up and speech recognition, and the recognition rate is high on the whole. If more training is done in terms of words with similar pronunciation, the effect will be better. In addition, some suggestions are put forward on the function of the product according to personal use experience:
1. Improve sound quality and voice diversity
In the future, more speakers with different styles will be provided, and users can choose independently. There will be more voice colors for selection and adaptation in different scenes such as shopping, information query, audio resource playback, etc. Let “Xiaodu” be humanized and personalized gradually.
2. Perfect voice interaction function
Provide higher quality voice interaction, enhance dialogue understanding and dialogue management technology, and just build capacity. Through continuous practice, the speaker can “understand” the user’s meaning, provide more accurate results for the user, and easily customize professional, controllable and stable complete voice interaction ability.
3. Timbre recognition
In the speech interaction, the voice color of different users is recognized by voice print, and the interaction content is judged and understood according to the voice color. This function can be used in voice print unlocking and speech interaction understanding, including the priority of command execution when multiple people speak.
There is a large space for voice interaction in the future, but the voice interaction function is not so extensive now. However, we believe that as long as we adhere to the development, maintain data collection, scene optimization, there will be more in-depth development in various fields in the future.
Author: let Tianya