[the road to advanced API] oral English test in college entrance examination? I did a 10W + screen swipe with the multimodal API


Abstract:I took part in more than 100, 000 Interactive English games.

According to the story in the last issue, I became the vice chairman of the company’s Technical Committee. The first thing I did after taking office was to establish a cloud containerized R & D database to reuse the “good things” in each R & D hard disk. (for details:Unimaginable! There are so many treasures in the hard disk of elder Ma Nong

Since then, the boss has given me less and less code work, so I can concentrate on the technical committee. It’s easy to make trouble when people are free,After doing nothing, I did an interactive game. The girl from operation pushed Zhang Haibao from the official account. It turned out to be a 100 thousand person participation in the screen level activities.

What’s going on? It was about July 25th, and I saw a piece of news,Oral English test will be added to the college entrance examination in 2021 in Beijing.Mom, it’s too sudden. Isn’t it as unexpected as a product manager’s sudden demand and Weibo’s operation and maintenance catching up with stars’ sudden announcement of marriage / breakup.

Soon, parents with children in high school all forwarded this message in their circle of friends. How concerned is this? I’ve always heard that operation is good at catching hot spots. Can I do something to catch this hot spot?

What are the most important concerns and needs of parents and students for this policy?To test oral English, first of all, you should know how your oral English level is, and where the short board is to be targeted. That is to say, oral test is required!

I remember when I wasHuawei cloud official websiteI saw one“Multimodal evaluation” API,According to the reading video data and test text, can give the reader oral evaluation score. I can use this API to do an evaluation class H5 game.

According to my habit, the process of realizing the function will be sorted into documents synchronously, which is convenient for the future wave people to learn.

Multimodal oral English Assessment

Content sources

Multimodal spoken language is still in the public test stage, so I applied for the public test in Huawei cloud in advance, and the public test passed that day~(Public beta link:https://activity.huaweicloud.com/AI_free0.html?ggw_hd

Step 1: data preparation

The video types supported include AVI / MP4 / WebM / MKV / flv. The video definition is not less than 240p, the frame rate is not less than 25FPS, and the size is limited to 10m

L supporting language: British English

L evaluation mode: word evaluation, sentence evaluation

The video needs to be converted to the corresponding Base64 encoding and uploaded

Conversion example (Python)

#!/usr/bin/env python
# encoding: utf-8
import base64
def ToBase64(file, txt):
    with open(file, 'rb') as fileObj:
        image_data = fileObj.read()
        base64_data = base64.b64encode(image_data)
        fout = open(txt, 'w')
ToBase64("./ test.wav Convert audio files to Base64

Step 2: build request

The general request mode of Huawei cloud voice interaction service is shown in the figure below.


At present, Huawei’s API is in the four stages of public testing- ext.cn -north-4. myhuaweicloud.com ”。

L need to confirm the ID and token of personal proprietary Huawei cloud project

Project ID acquisition methodToken query method

After obtaining authentication, you can fill in the request. Suppose I have a file in avi format and want to use word pattern to judge the quality of oral English. The spoken language I want to learn is “sit down”.

An example of a request is:

POST https://{endpoint}/v1/{project_id}/assessment/video
Request Header:
Content-Type: application/json
Request Body:
    "video_format": "avi",
    "language": "en_gb",
    "mode": "word"
  "video_data": "/+MgxAAUeHpMAUkQAANhuRAC...",
  "ref_text": "sit down"

Three steps of result return

    "fluency": {
        "score": 75.02139,
        "rhythm": 50.042786,
        "cohesion": 100.0
    "pronunciation": {
        "score": 36.817684,
        "gop": 36.817684
    "score": 22.09061,
    "completeness": 0.0,
"duration": 2.46,
"Words": [after that, the results of phoneme and phonetic symbol evaluation are shown below]

From the returned results, we can see that:

(1) From the perspective of fluency: Fluency score is 75.02; coherence is full score; rhythm is weak, only 50.04

(2) From the pronunciation point of view: my pronunciation quality score is 36.82; the pronunciation score is 36.82

(3) The final comprehensive score: 22.09

Ah, it’s quite accurate. I’ve been poor in spoken English since I was a child

Also can carry on the spoken English practice, from pronunciation mouth shape, pronunciation effect correction spoken language.Through the specific single word, phoneme, phonetic mark scoring, you can know which word and phonetic symbol need to strengthen practice.

Taking “sit” pronunciation assessment as an example, the feedback results are as follows:

"words": [
            "fluency": {
                "score": 68.29714,
                "rhythm": 68.29714
            "pronunciation": {
                "score": 24.714167,
                "gop": 24.714167
            "out_of_vocabulary": false,
            "text": "sit",
            "text_original": "sit",
            "text_normalised": [
            "score": 46.505653,
            "start_time": 1.03,
            "end_time": 1.06,
            "phonemes": [
                    "fluency": {
                        "score": 31.643274,
                        "rhythm": 31.643274
                    "pronunciation": {
                        "score": 16.471563,
                        "gop": 16.471563
                    "start_time": 1.03,
                    "end_time": 1.04,
                    "arpa": "S",
                    "ipa": "s"
                    "fluency": {
                        "score": 87.00653,
                        "rhythm": 87.00653
                    "pronunciation": {
                        "score": 28.179922,
                        "gop": 28.179922
                    "start_time": 1.04,
                    "end_time": 1.05,
                    "arpa": "IH",
                    "ipa": "i"
                    "fluency": {
                        "score": 86.241615,
                        "rhythm": 86.241615
                    "pronunciation": {
                        "score": 29.491013,
                        "gop": 29.491013
                    "start_time": 1.05,
                    "end_time": 1.06,
                    "arpa": "T",
                    "ipa": "t"

Soon I finished the little game, and took the oral proficiency test as the core game, joined the sharing guide mechanism that invited friends to play again. I never thought that I just pushed it from the company’s official account, and I actually painted the screen in my circle of friends.Online 3 days, the number of games more than 100000 people! A 10W + screen swiping activity led by R & D was born. Who said that R & D didn’t understand operation?

It is understood that at presentAPI Explorer platformWe have opened 70 + cloud services such as EI enterprise intelligence, computing, application services, network, software development platform, video, etc., with 2000 + APIs and 6000 + error codes online. During the preliminary trial operation, the API interface on Huawei cloud API Explorer platform has been successfully accessed by many enterprises.

