DIY, the most powerful DIY in history, handmade a talking robot dog


Boston powered robot dog, want it? Come and make it with me.

Do you understand the Boston powered robotic dog?

An AI that can flip back, open doors and climb stairs.

Recently, a batch of mini robot dogs have come to our laboratory. Although they can’t perform all kinds of high-level acrobatics, they are very eloquent.

Objective: a robot dog with voice interaction

At present, using API to access cloud to realize functions such as speech recognition and speech synthesis module can gradually simplify or replace some complex local resource deployment and realize corresponding functions quickly.

However, for some voice interactive research robots, voice services often face the problems of small number of locally deployed devices, high deployment cost and maintenance trouble. It is a key bottleneck for the design of voice interaction robot to find a low-cost, easy to deploy and rapid speech synthesis service module.

This time, we carry out voice recognition ASR, NLP and TTS services for Robo dog research, so as to realize accurate and fast speech recognition, multi tone and emotional speech synthesis, voice motion control, intelligent reminder and other functions.

To meet the above requirements, we have selected voice recognition ASR, speech synthesis TTS and NLP products of Huawei cloud. In fact, the process of experimental transformation is very simple, which can be divided into three steps

  1. The voice wake-up service is deployed in the local device. After the voice wake-up of the device, the recording fragment is transmitted to the Huawei cloud through the voice recognition interface of Huawei cloud for speech recognition processing.
  2. The speech recognition text information returned by Huawei cloud is processed in local natural language, or the natural language processing module of Huawei cloud is used for natural language processing to obtain corresponding semantic and control instruction information.
  3. The text that needs speech synthesis is transmitted to Huawei cloud through the voice synthesis interface of Huawei cloud, and the corresponding audio information is obtained.

Figure: business architecture diagram / scheme screenshot:

be born! A talking robot dog

Finally, through the voice recognition related products of Huawei cloud, this robot dog can not only understand people’s language, but also communicate with everyone. Specifically, it can realize voice interaction in the following scenarios.

Control command identification:Through the speech recognition service, through regular matching, database comparison and other operations, the control command information in the voice information is obtained, which is used for the voice control of the robot.

Phonetic transcription of dialogue voice:The speech recognition service is used to obtain the corresponding text information in the speech information for text input of natural language processing module or API input of dialogue robot.

Natural language processing:Through the natural language processing service of Huawei cloud, the corresponding reply language is obtained, which is used for intelligent dialogue, intelligent reminder and other functions.

Speech synthesis function:The voice synthesis service of Huawei cloud is used to realize the voice synthesis service of answer text.

Although it’s not as smart as the Boston powered robot dog, it may be better at voice conversation.

Through the experience, the voice recognition products of Huawei cloud are quite good.

First of all, it simplifies the configuration of voice interaction module. Students can easily realize voice recognition, speech synthesis and other services through API calls, which is simple and convenient.

Secondly, it improves the quality of voice interaction. Thanks to Huawei’s cloud features of low latency and high speed, online services can be comparable with local services, with high speech recognition accuracy. At the same time, speech synthesis provides a variety of interactive voice colors for developers to use. Unfortunately, the current recognition speed of long speech needs to be further optimized. Moreover, the synthesis of Chinese and English speech can be considered in speech synthesis, so as to improve the emotional degree of Chinese English speech synthesis and the naturalness of convergence.

