In the past, I have encountered many problems in dealing with Baidu voice, such as slow recognition and poor accuracy. There are too many reasons for self and equipment, so it’s hard to walk. I think it is difficult for a non audio professional to further improve the performance and quality in this area.
But now, baidu voice is constantly improving, and has launched a variety of new things that I am very suitable for, such as:
Baidu speech recognition fast version just launched last month
Portal > http://ai.baidu.com/forum/topic/show/943032
This ability can improve the recognition speed by about 3-9 times through my own specific sample test. In the test samples of the upper conveyor door, the slowest ordinary version and the fastest extreme version take 24 times longer than each other. It can be seen that the extreme version is the best alternative to the current ordinary version!
This time I will bring you the new star of Baidu development kit
Far field voice development kit!
Portal > https://aim.baidu.com/product/b226a947-4660-4e27-83b4-877bf63b8627
This is a very good product. Just like the previous face development kit, it can effectively help enterprises and individual developers who want to land voice recognition to develop their own business products quickly.
In this product specification, there are three configurations to choose from, which are:
6 + 1 circular wheat array
4mic linear array
3mic triangle array
They have their own application scenarios. In order to better improve your future products, you must listen to me finish their advantages!
6 + 1 ring wheat array
The 6 + 1 ring array is composed of six microphones around and one in the middle, which can realize:
360 ° zero dead angle surround sound field
Enhance the localization and beamforming effect of GSC sound source
AEC technology based on nonlinear elimination
Recommended for smart home products such as smart speakers.
Like the mainstream tmall elves and millet speakers, they all use the ring-shaped 6-wheat array!
There are also similar solutions adopted by Amazon echo abroad
He can identify and locate the sound source without dead angle! It’s still worth playing~
Linear 4-wheat array
4mic array is arranged horizontally by 4 microphones
The formation takes up small space and can adapt to various hardware structure designs.
It is recommended to be used in smart TV, flat panel, air conditioner, refrigerator and other traditional white electric products.
Triangle Mike array
The 3mic array is made up of three microphones arranged in a triangle
Support dual voice area and meet the voice interaction requirements of the driver / copilot
Enhance the localization and beamforming effect of GSC sound source
AEC technology based on nonlinear elimination
3mic also supports sound source location
What I received this time is the 4mic array kit. He also has sound source location~
No more bullshit, open the box!
First of all, the outer package of the development kit is exquisite and small. The square and upright outer package has a kind of low-key luxury and mystery. I wonder if you have seen the four wheel drive brothers?
What a mysterious breath it is! Uncover it and look inside
The box is very simple and clear, a 3-fold, simple instructions, development kit, data cable and power cable.
The manual briefly introduces the package content list, the interface diagram of the development board, the hardware connection guide, the test method and the construction guide of the software development environment, which is relatively simple.
Let’s take a look at the development kit itself. The development board used in this kit is the rk3308 development board jointly built by Shenzhen Bnd Electronics Co., Ltd. and Baidu, with 128M ram and 128M flash. The CPU is rk3308, 4-core arm cortex-a35.
WiFi only supports 2.4G band, and Bluetooth supports 4.0.
There is an additional wifi antenna in this kit, so the WiFi quality bar.
The microphone is compatible with the three arrays above.
The data line is USB micro, which is mainly used for ADB debugging.
The matched power supply is 12v2a.
Hello World (MAC part) of far field speech recognition Suite
Turn on the device and plug in the USB cable. We are about to enter the environment configuration stage.
This article uses Mac OS make 10.14.4 for demonstration.
Before inserting the device, you should ensure that your system has an ADB environment. Confirm in terminal
As for how to install it, please refer to various ADB deployment articles under Baidu search~
Then the first step is to help the Development Suite connect to our WiFi.
According to the WiFi configuration mentioned in “Baidu brain far field voice development kit rk3308 development platform instructions. PDF”, the network access can be successfully completed. Here I’ll do it again
After connecting the device with the data line, we can use the
ADB devices command query development kit
Then type the ADB shell and use the command line to debug the device
CD / data / CFG enter the directory of WiFi configuration file
Open profile with VI command
By default, the home router only needs to modify SSID (WiFi name) and PSK (password key).
If WiFi has special circumstances, add it by yourself
Key_mgmt = wpa-psk // encryption method #Key_mgmt = none // no encryption
Save configuration after modification
wpa_cli reconfigure wpa_cli reconnect
Command activation and re networking
At this moment, we have completed the equipment access to the network.
Speech recognition and synthesis need to use the network, so we must enter the network correctly!
(I can’t use Bluetooth at present, so I haven’t continued to configure it for the moment.)
Then we will try to start the demo program of speech recognition to see if the environment is normal
All SDK resources and related files are in the root OEM folder
According to the product manual, we need to start the alsa audio main service first.
Before we start it, we need to modify the directory permission and grant permission to start it
Multi? Refers to the use of 4-array board sound card 2. &Background start
Then use PS-A to check whether the service is started correctly in the background
After starting the alsa service, we can go to and start the demo in
The sample program file is located in the / OEM / bdspeechsdk / sample directory, and the runtime depends on the libraries and resource files in the lib, resources, and extern directories.
So we need to share the library at startup
You can see here about shared libraries
Magic portal > https://www.cnblogs.com/mylinux/p/4955448.html
We run instructions
cd /oem/BDSpeechSDK/sample/wakeup LD_LIBRARY_PATH=/oem ./e2e_wp_asr_test
Try shouting “Xiaodu, what’s the weather like today”?
It can be found that this kit adopts flow recognition! With intermediate results!
And at a distance of 3 meters, using whispers to call Xiaodu Xiaodu can also wake up successfully!
Maybe we need to improve our English~
(maybe my English is too loose)
On speech synthesis
The example program will send the text “456hello, good weather today” to the server, which will generate the corresponding voice and save it as a PCM file, so that the user can play the experience.
Start the voice synthesis function in the terminal, and generate the voice corresponding to the text “456hello, good weather today”.
cd /oem/BDSpeechSDK/sample/tts LD_LIBRARY_PATH=/oem ./online_test
The test program does not provide the function of generating voice by inputting synthetic text. Users can refer to the sample program to develop by themselves.
After running, a xxx.pcm will be generated in the current directory, where XXX is the time stamp of the test. Execute the following command on the terminal to experience the voice synthesis effect
aplay -t raw -c 1 -f S16_LE -r 16000 xxx.pcm
Cross compilation of far field speech recognition suite samplecode
After nearly a week’s efforts, the sample program can be successfully cross compiled.
This article is only for how to successfully cross compile and solve the problems I encountered. Focus on cross compiling process
Here, I deployed Ubuntu virtual machine on Mac with parallel. In order to reconfirm the steps, I refitted a machine and went through the steps again. And deliberately reproduce the mistakes and solutions I have encountered in recent days, which may help you.
Here you need the following:
- Cross compilation tool chain of rk3308
- GCC environment
- Ubuntu version 16.04 LTS
Copy the bdspeech SDK directory of OEM in the rk3308 board to the virtual machine. You can download the ADB pull / OEM / bdspeech SDK instructions
Then put the SDK on the virtual machine. I put it directly in the home root directory
Download cross tool chain
Link: https://pan.baidu.com/s/1leflaqfxhasmqgmfjswta extraction code: we2t
The compilation of rk3308 is not possible on standard Linux. We need to use cross compilation tool chain here. This is a special compiler, which can be considered as a toolkit that can run only when platform a compiles platform B.
Then copy the cross compilation chain to the virtual machine
Construction of project directory structure as required
Based on what is described in quick’u start.md, which is currently created on April 23
mkdir my_specch_project cd my_speech_projects touch Makefile mkdir src touch src/main.cpp
Create the following directory structure:
We go to the sample directory, create the project folder and Src directory, and create the specified file
Write (copy) sample code
E2E ﹣ WP ﹣ ASR ﹣ test.cpp in sample / ASR / wakeup / SRC and corresponding demo code in quick ﹣ start.md, here I directly copy the sample code in wakeup / SRC to replace main.cpp
If you don’t make any changes here, just copy. The first goal at present is to cross compile and run on the board as soon as possible
Write (copy) makefile code
Makefile can help the project quickly connect and compile. It can save a lot of effort. Since I am not a pure C + + Development Engineer, I copy the makefile code in quick’u start.md here
#make src=src/***.cpp FILE_NAME=$(src) SYS_ROOT=$(sr) TARGET=$(basename $(notdir $(FILE_NAME))) #build CXX=arm-rockchip-linux-gnueabihf-g++ INCLUDE=-I../../include -I../../include/ASR -I../../include/TTS -I../../extern/include -I../../extern/include/longconnect CPPFLAGS=-Wall -fopenmp -O2 -fPIC -g -D__LINUX__ -Wl,-rpath=../../lib,--disable-new-dtags,--copy-dt-needed-entries -Wl,-rpath=../../extern/lib,--disable-new-dtags -L../../lib -lBDSpeechSDK -L../../extern/lib -lzlog -llongconnect -lnghttp2 -lcurl -lssl -lcrypto -lz -lAudioEncoder -liconv -lAudioDecoder -lhttpDNS -lbd_alsa_audio_client -lgomp -lrt -ldl -lpthread ifneq ($(strip $(SYS_ROOT)),) MY_SYS_ROOT=--sysroot=$(SYS_ROOT) endif SRC_PATH=./src SRC_FILE=$(shell cd $(SRC_PATH)&&echo *.cpp) SRC=$(foreach n,$(SRC_FILE),$(SRC_PATH)/$(n)) $(TARGET):$(SRC) $(CXX) -o $(TARGET) ./$(FILE_NAME) $(MY_SYS_ROOT) $(INCLUDE) $(CPPFLAGS) #clean LIST_ALL_FILES=$(shell find . -maxdepth 1) SOURCES=. ./Makefile ./src RM_FILES=$(filter-out $(SOURCES),$(LIST_ALL_FILES)) clean: -rm -rf $(RM_FILES)
Here’s the first question:
After copying, because of indentation, when you Ctrl CV, it is very likely to copy the indentation together, so you need to remove the code indentation here, and keep it clean. I’ll show you the error prompt of indenting when compiling later, and I’ll keep the original format here.
In the compile section of quick start.md, we are required to execute in the directory where the makefile is located after configuration
export PATH=path-to-cross-compiler-root/host/bin:$PATH make FILE_NAME=src/main.cpp SYS_ROOT=path-to-cross-compiler-root/host/arm-rockchip-linux-gnueabihf/sysroot
Here, path to cross compiler root needs to be replaced with the root directory / bin of our tool chain
/Host is actually the root directory of our tool chain
In my directory environment, the equivalent is replaced by
export PATH=/home/parallels/rk3308_arm_tool_chain/bin:$PATH make FILE_NAME=src/main.cpp SYS_ROOT=/home/parallels/rk3308_arm_tool_chain/arm-rockchip-linux-gnueabihf/sysroot
There will be many problems here. If the operation above is the same as me, you are likely to encounter these problems!
Error 1. Makefile: 18
Makefile compilation error due to special indentation
The specific error prompts are as follows
After many times of testing, I’ve hit a lot of blank lines and I’ll be wrong here in line 18. The solution later is to remove all indents!
Error 2. Undefined reference
After the indentation above is removed, compile again, and a new prompt will appear:
This error is due to the lack of alsa’s so library. This error is also written in quick’u start.md
If you encounter an error similar to LD: can't find - LBD ﹣ alsa ﹣ audio ﹣ client, please download the alsa service package from the official website or extract the relevant library from the / ome / directory of the development kit and put it under the project to participate in the link.
Here we pull a file from the OEM directory. He is in the / OEM directory, named libbd_alsa_audio_client.so. Copy it to bdspeech SDK / lib. This directory is dedicated to external dependent library files. Let’s also put this one here.
Then try to compile again. There is no error prompt. The compilation is passed
Then you can see an executable program of main in the same directory of makefile. This program can be executed in the environment of rk3308. Put it in the board through the ADB. As a reminder, the TMP directory will be cleaned after power failure.
(the process of ADB push. / main / TMP is omitted here)
Try to open main under ADB
Our main also depends on alsa’s services, so we can directly set alsa to boot here.
/OEM / rklink.sh this file is an executive file that runs when the rk3308 board is turned on. We can write all the things that need to be started when the board is turned on in this file, so that the board will automatically start alsa for us next time
A few lines of code have been added here, mainly to change directory permissions and run alsa service
But this time, it didn’t start. You need to start alsa manually. The starting mode is the above five sentences.
Here, you can restart the board with the reboot command, but the main file that TMP just put in has been washed out, but you can check whether the startup is normal. This is a trade-off~
After starting alsa, let’s start main
cd /tmp/ LD_LIBRARY_PATH=/oem:/oem/BDSpeechSDK/lib:/oem/BDSpeechSDK/extern/lib ./main
If we see this output, then we are not far from success. But one of the sentences affects the whole program. This is not a compilation problem!
Error 3. Dat file invalid
error:5, domain:38, desc:Wakeup: dat file invalid., sn:
This means that the dat file was not loaded successfully.
Let’s take a look at the code. In the wakeup ﹣ config function, you can see the path of its configuration dat file, which is.. / resources / EIS ﹣ resource.pkg
Just change this level to absolute path, or change the path to short. / esis_resource.pkg, and copy the PKG file
Then recompile, push the ADB to TMP, and omit here
Remember to push the dat file to TMP, if it’s the same as my modification
Then execute main again
It can be found that the wake-up in callback activates the engine load and start wake-up.
We can try it at this time
What's the weather like in Shanghai today?
So far, the cross compilation of demo project project has been completed.
This is just the result of compiling the default sample program. There are many hidden functions with unlocking.
This is my 7-day effort. If this article helps you, please give me a compliment~
Author:Zhou Shi Le