Evaluation of far field speech recognition Suite

Time:2020-1-25

In the past, I have encountered many problems in dealing with Baidu voice, such as slow recognition and poor accuracy. There are too many reasons for self and equipment, so it’s hard to walk. I think it is difficult for a non audio professional to further improve the performance and quality in this area.

But now, baidu voice is constantly improving, and has launched a variety of new things that I am very suitable for, such as:

Baidu speech recognition fast version just launched last month

Portal > http://ai.baidu.com/forum/topic/show/943032

 

This ability can improve the recognition speed by about 3-9 times through my own specific sample test. In the test samples of the upper conveyor door, the slowest ordinary version and the fastest extreme version take 24 times longer than each other. It can be seen that the extreme version is the best alternative to the current ordinary version!

This time I will bring you the new star of Baidu development kit

Far field voice development kit!

Portal > https://aim.baidu.com/product/b226a947-4660-4e27-83b4-877bf63b8627

This is a very good product. Just like the previous face development kit, it can effectively help enterprises and individual developers who want to land voice recognition to develop their own business products quickly.

In this product specification, there are three configurations to choose from, which are:

6 + 1 circular wheat array
4mic linear array
3mic triangle array
They have their own application scenarios. In order to better improve your future products, you must listen to me finish their advantages!

 

6 + 1 ring wheat array

 

The 6 + 1 ring array is composed of six microphones around and one in the middle, which can realize:

360 ° zero dead angle surround sound field
Enhance the localization and beamforming effect of GSC sound source
AEC technology based on nonlinear elimination
Recommended for smart home products such as smart speakers.

Like the mainstream tmall elves and millet speakers, they all use the ring-shaped 6-wheat array!

There are also similar solutions adopted by Amazon echo abroad

He can identify and locate the sound source without dead angle! It’s still worth playing~

Linear 4-wheat array

4mic array is arranged horizontally by 4 microphones
The formation takes up small space and can adapt to various hardware structure designs.

It is recommended to be used in smart TV, flat panel, air conditioner, refrigerator and other traditional white electric products.

 

Triangle Mike array

 

The 3mic array is made up of three microphones arranged in a triangle

Support dual voice area and meet the voice interaction requirements of the driver / copilot
Enhance the localization and beamforming effect of GSC sound source
AEC technology based on nonlinear elimination
3mic also supports sound source location

 

 

What I received this time is the 4mic array kit. He also has sound source location~

 

No more bullshit, open the box!

First of all, the outer package of the development kit is exquisite and small. The square and upright outer package has a kind of low-key luxury and mystery. I wonder if you have seen the four wheel drive brothers?

What a mysterious breath it is! Uncover it and look inside

The box is very simple and clear, a 3-fold, simple instructions, development kit, data cable and power cable.

The manual briefly introduces the package content list, the interface diagram of the development board, the hardware connection guide, the test method and the construction guide of the software development environment, which is relatively simple.

Let’s take a look at the development kit itself. The development board used in this kit is the rk3308 development board jointly built by Shenzhen Bnd Electronics Co., Ltd. and Baidu, with 128M ram and 128M flash. The CPU is rk3308, 4-core arm cortex-a35.

 

WiFi only supports 2.4G band, and Bluetooth supports 4.0.

There is an additional wifi antenna in this kit, so the WiFi quality bar.

The microphone is compatible with the three arrays above.

The data line is USB micro, which is mainly used for ADB debugging.

The matched power supply is 12v2a.

 

 

Hello World (MAC part) of far field speech recognition Suite

Turn on the device and plug in the USB cable. We are about to enter the environment configuration stage.

 

This article uses Mac OS make 10.14.4 for demonstration.

Before inserting the device, you should ensure that your system has an ADB environment. Confirm in terminal

As for how to install it, please refer to various ADB deployment articles under Baidu search~

 

Then the first step is to help the Development Suite connect to our WiFi.

According to the WiFi configuration mentioned in “Baidu brain far field voice development kit rk3308 development platform instructions. PDF”, the network access can be successfully completed. Here I’ll do it again

After connecting the device with the data line, we can use the
ADB devices command query development kit

Then type the ADB shell and use the command line to debug the device

CD / data / CFG enter the directory of WiFi configuration file

Open profile with VI command

 

By default, the home router only needs to modify SSID (WiFi name) and PSK (password key).
If WiFi has special circumstances, add it by yourself

Key_mgmt = wpa-psk // encryption method
#Key_mgmt = none // no encryption

Save configuration after modification

Subsequent input

wpa_cli reconfigure
wpa_cli reconnect

Command activation and re networking

At this moment, we have completed the equipment access to the network.

Speech recognition and synthesis need to use the network, so we must enter the network correctly!

(I can’t use Bluetooth at present, so I haven’t continued to configure it for the moment.)

Then we will try to start the demo program of speech recognition to see if the environment is normal

All SDK resources and related files are in the root OEM folder

According to the product manual, we need to start the alsa audio main service first.
Before we start it, we need to modify the directory permission and grant permission to start it

Multi? Refers to the use of 4-array board sound card 2. &Background start

Then use PS-A to check whether the service is started correctly in the background

After starting the alsa service, we can go to and start the demo in

The sample program file is located in the / OEM / bdspeechsdk / sample directory, and the runtime depends on the libraries and resource files in the lib, resources, and extern directories.

So we need to share the library at startup

You can see here about shared libraries

Magic portal > https://www.cnblogs.com/mylinux/p/4955448.html

We run instructions

 

cd /oem/BDSpeechSDK/sample/wakeup
LD_LIBRARY_PATH=/oem ./e2e_wp_asr_test

Try shouting “Xiaodu, what’s the weather like today”?

It can be found that this kit adopts flow recognition! With intermediate results!

And at a distance of 3 meters, using whispers to call Xiaodu Xiaodu can also wake up successfully!

Maybe we need to improve our English~
(maybe my English is too loose)

On speech synthesis

The example program will send the text “456hello, good weather today” to the server, which will generate the corresponding voice and save it as a PCM file, so that the user can play the experience.
Start the voice synthesis function in the terminal, and generate the voice corresponding to the text “456hello, good weather today”.

cd /oem/BDSpeechSDK/sample/tts
LD_LIBRARY_PATH=/oem ./online_test

The test program does not provide the function of generating voice by inputting synthetic text. Users can refer to the sample program to develop by themselves.
After running, a xxx.pcm will be generated in the current directory, where XXX is the time stamp of the test. Execute the following command on the terminal to experience the voice synthesis effect

aplay -t raw -c 1 -f S16_LE -r 16000 xxx.pcm

 

Cross compilation of far field speech recognition suite samplecode

After nearly a week’s efforts, the sample program can be successfully cross compiled.

This article is only for how to successfully cross compile and solve the problems I encountered. Focus on cross compiling process

Here, I deployed Ubuntu virtual machine on Mac with parallel. In order to reconfirm the steps, I refitted a machine and went through the steps again. And deliberately reproduce the mistakes and solutions I have encountered in recent days, which may help you.

Here you need the following:

  • Cross compilation tool chain of rk3308
  • GCC environment
  • Ubuntu version 16.04 LTS

Download SDK

Copy the bdspeech SDK directory of OEM in the rk3308 board to the virtual machine. You can download the ADB pull / OEM / bdspeech SDK instructions

Then put the SDK on the virtual machine. I put it directly in the home root directory

Download cross tool chain

Link: https://pan.baidu.com/s/1leflaqfxhasmqgmfjswta extraction code: we2t

The compilation of rk3308 is not possible on standard Linux. We need to use cross compilation tool chain here. This is a special compiler, which can be considered as a toolkit that can run only when platform a compiles platform B.
Then copy the cross compilation chain to the virtual machine

Construction of project directory structure as required

Based on what is described in quick’u start.md, which is currently created on April 23

mkdir my_specch_project
cd my_speech_projects
touch Makefile
mkdir src
touch src/main.cpp

Create the following directory structure:

my_speech_project/
├── Makefile
└── src
└── main.cpp

We go to the sample directory, create the project folder and Src directory, and create the specified file

 

Write (copy) sample code

E2E ﹣ WP ﹣ ASR ﹣ test.cpp in sample / ASR / wakeup / SRC and corresponding demo code in quick ﹣ start.md, here I directly copy the sample code in wakeup / SRC to replace main.cpp

If you don’t make any changes here, just copy. The first goal at present is to cross compile and run on the board as soon as possible

 

Write (copy) makefile code

Makefile can help the project quickly connect and compile. It can save a lot of effort. Since I am not a pure C + + Development Engineer, I copy the makefile code in quick’u start.md here

 

#make src=src/***.cpp
	FILE_NAME=$(src)
	SYS_ROOT=$(sr)
	TARGET=$(basename $(notdir $(FILE_NAME)))
	
	#build
	CXX=arm-rockchip-linux-gnueabihf-g++
	INCLUDE=-I../../include -I../../include/ASR -I../../include/TTS -I../../extern/include -I../../extern/include/longconnect
	CPPFLAGS=-Wall -fopenmp -O2 -fPIC -g -D__LINUX__ -Wl,-rpath=../../lib,--disable-new-dtags,--copy-dt-needed-entries -Wl,-rpath=../../extern/lib,--disable-new-dtags -L../../lib -lBDSpeechSDK -L../../extern/lib -lzlog -llongconnect -lnghttp2 -lcurl -lssl -lcrypto -lz -lAudioEncoder -liconv -lAudioDecoder -lhttpDNS -lbd_alsa_audio_client -lgomp -lrt -ldl -lpthread
	ifneq ($(strip $(SYS_ROOT)),)
	MY_SYS_ROOT=--sysroot=$(SYS_ROOT)
	endif
	
	SRC_PATH=./src
	SRC_FILE=$(shell cd $(SRC_PATH)&&echo *.cpp)
	SRC=$(foreach n,$(SRC_FILE),$(SRC_PATH)/$(n))
	
	$(TARGET):$(SRC)
		$(CXX) -o $(TARGET) ./$(FILE_NAME) $(MY_SYS_ROOT) $(INCLUDE) $(CPPFLAGS)
	
	#clean
	LIST_ALL_FILES=$(shell find . -maxdepth 1)
	SOURCES=. ./Makefile ./src
	RM_FILES=$(filter-out $(SOURCES),$(LIST_ALL_FILES))
	
	clean:
		-rm -rf $(RM_FILES)

Here’s the first question:
After copying, because of indentation, when you Ctrl CV, it is very likely to copy the indentation together, so you need to remove the code indentation here, and keep it clean. I’ll show you the error prompt of indenting when compiling later, and I’ll keep the original format here.

 

Attempt compilation

In the compile section of quick start.md, we are required to execute in the directory where the makefile is located after configuration

export PATH=path-to-cross-compiler-root/host/bin:$PATH
make FILE_NAME=src/main.cpp SYS_ROOT=path-to-cross-compiler-root/host/arm-rockchip-linux-gnueabihf/sysroot

Here, path to cross compiler root needs to be replaced with the root directory / bin of our tool chain
/Host is actually the root directory of our tool chain

In my directory environment, the equivalent is replaced by

export PATH=/home/parallels/rk3308_arm_tool_chain/bin:$PATH
make FILE_NAME=src/main.cpp SYS_ROOT=/home/parallels/rk3308_arm_tool_chain/arm-rockchip-linux-gnueabihf/sysroot

There will be many problems here. If the operation above is the same as me, you are likely to encounter these problems!

 

Error 1. Makefile: 18

Makefile compilation error due to special indentation
The specific error prompts are as follows

 

After many times of testing, I’ve hit a lot of blank lines and I’ll be wrong here in line 18. The solution later is to remove all indents!

 

Error 2. Undefined reference

After the indentation above is removed, compile again, and a new prompt will appear:

This error is due to the lack of alsa’s so library. This error is also written in quick’u start.md

If you encounter an error similar to LD: can't find - LBD ﹣ alsa ﹣ audio ﹣ client, please download the alsa service package from the official website or extract the relevant library from the / ome / directory of the development kit and put it under the project to participate in the link.

Here we pull a file from the OEM directory. He is in the / OEM directory, named libbd_alsa_audio_client.so. Copy it to bdspeech SDK / lib. This directory is dedicated to external dependent library files. Let’s also put this one here.

Then try to compile again. There is no error prompt. The compilation is passed

 

Then you can see an executable program of main in the same directory of makefile. This program can be executed in the environment of rk3308. Put it in the board through the ADB. As a reminder, the TMP directory will be cleaned after power failure.

(the process of ADB push. / main / TMP is omitted here)

 

Try to open main under ADB

 

 

Our main also depends on alsa’s services, so we can directly set alsa to boot here.

/OEM / rklink.sh this file is an executive file that runs when the rk3308 board is turned on. We can write all the things that need to be started when the board is turned on in this file, so that the board will automatically start alsa for us next time

A few lines of code have been added here, mainly to change directory permissions and run alsa service

But this time, it didn’t start. You need to start alsa manually. The starting mode is the above five sentences.

Here, you can restart the board with the reboot command, but the main file that TMP just put in has been washed out, but you can check whether the startup is normal. This is a trade-off~

After starting alsa, let’s start main

cd /tmp/
LD_LIBRARY_PATH=/oem:/oem/BDSpeechSDK/lib:/oem/BDSpeechSDK/extern/lib ./main

If we see this output, then we are not far from success. But one of the sentences affects the whole program. This is not a compilation problem!

 

Error 3. Dat file invalid

error:5, domain:38, desc:Wakeup: dat file invalid., sn:

This means that the dat file was not loaded successfully.

Let’s take a look at the code. In the wakeup ﹣ config function, you can see the path of its configuration dat file, which is.. / resources / EIS ﹣ resource.pkg
Just change this level to absolute path, or change the path to short. / esis_resource.pkg, and copy the PKG file

Then recompile, push the ADB to TMP, and omit here

Remember to push the dat file to TMP, if it’s the same as my modification

Then execute main again

 

It can be found that the wake-up in callback activates the engine load and start wake-up.

We can try it at this time

What's the weather like in Shanghai today?

 

So far, the cross compilation of demo project project has been completed.

This is just the result of compiling the default sample program. There are many hidden functions with unlocking.

This is my 7-day effort. If this article helps you, please give me a compliment~

Author:Zhou Shi Le

Recommended Today

GMP principle and scheduling analysis of golang scheduler

Series of articles: – golang deeply understands GMP GMP principle and scheduling analysis of golang scheduler This paper mainly introduces the process and principle of goroutine scheduler in detail. It can have a clear understanding of the detailed scheduling process of go scheduler. It takes 4 days to make 30 + graphs (recommended Collection), including […]