Cross-platform Android and IOS Baidu voice online recognition native plug-in

Time:2022-11-22

1. Plug-in recommendation

  • Online preview of cross-platform Office documents and pictures, and native plug-ins for video playback
  • Online preview of Android and IOS pictures, native plug-in for video playback
  • Cross-platform Android and IOS Baidu OCR text recognition, card recognition, bill recognition native plug-in

2. Preparation

1. Preparations for Android and IOS certificates

  • Android side: Generate Android packagekeystore certificate fileand getMD5(Very important), reference document: Android platform signature certificate (.keystore) generation guide – DCloud Q&A
  • IOS side: apply for IOS certificate under Windows system, apply for IOS certificate under Mac system

2. Baidu data preparation

  • Enter Baidu AI Open Platform
  • click on the upper right cornerconsoleLogin, if you do not have an account, please register an account first
  • After logging in, if there is no real-name authentication, please use real-name authentication first, and recommend enterprise real-name authentication (more functions can be used)
  • Select Voice Technology—”Create Application

    Cross-platform Android and IOS Baidu voice online recognition native plug-in

  • Get API Key, Secret Key and License file

    Cross-platform Android and IOS Baidu voice online recognition native plug-in

  • For details of Baidu speech recognition interface charges, please view it in Baidu Console—”Overview, or directly view the product pricing document

    Cross-platform Android and IOS Baidu voice online recognition native plug-in

3. Get started quickly

  • Step1: Download this plug-in sample project, or download GitHub – silianpan/Seal-UniPlugin-Demo
  • Step2: Open manifest.json—”Basic Configuration—”Reacquire DCloud AppID
  • Step3: Click to try
  • Step4: Open manifest.json—”App Native Plugin Configuration—”Select Cloud Plugin

    Cross-platform Android and IOS Baidu voice online recognition native plug-in

  • Step5: Make a custom debugging base: Click in the HBuilderX menuRun—”Run to mobile phone or emulator—”Make a custom debugging base, fill in the steps and precautions as shown in the figure below

    Attachment: debug.keystore download link, just for testing

    Keystore name: “debug.keystore” Keystore password: “android” Key alias: “AndroidDebugKey” Key password: “android”

    md5:A5:61:77:2E:AA:63:15:18:47:D6:5B:EC:6A:FA:F4:0A

    Cross-platform Android and IOS Baidu voice online recognition native plug-in

  • Step6: Select a custom debugging base: clickRun—”Run to mobile phone or emulator—”base operation selection—”custom debug base
  • Step7: Debugging and running: clickRun—”Run to Phone or Simulator—”Run to Android App Dock

4. Interface Manual

  • Plug-in method one:recogOnlineStart, start online identification
  • method parameters
parameter type Defaults Is it required? illustrate
appId string null no Baidu AI open platform console application AppID
appKey string null no Baidu AI Open Platform Console Application Api Key
appSecret string null no Baidu AI open platform console application Secret Key
pid int null no PID, language, details are as follows
lmId int null no Self-training platform ID, please select PID=8002 to take effect
enableLongSpeech bool false no Long speech, priority is higher than vad_endpoint_timeout
vadEndpointTimeout int null no VAD duration setting, select 0 for long voice
vad string dnn no Whether VAD is enabled, dnn, default, recommended model; touch, disable mute sentence function, the user stops recording manually.
infile string null no External audio, which can be: resource path or callback method name. This parameter supports setting: a. pcm file, system path, such as: /sdcard/test/test.pcm; audio pcm file does not exceed 3 minutes b. pcm file, JAVA Resource path, such as: res:///com/baidu.test/16k_test.pcm; audio pcm file no longer than 3 minutes c. InputStream data stream, #string of the full name of the method, format such as: “#com.test. Factory.create16KInputStream()” (Explanation: There is a method create16kInputStream() that returns InputStream in the Factory class). Note: it must start with a pound sign; the method prototype must be: public static InputStream create16KInputStream(). For recording files longer than 3 minutes, please sleep during each read to avoid insufficient internal buffering of the SDK.
multiInvoke bool true no Whether to keep multiple voice recognition result callbacks
checkPermRecordAudio bool true no Whether to enable check recording permission
isFinish bool false no Whether to end the recognition
  • PID, language detailed description

    • For online parameters, please select PID according to the language, input method model and whether online semantics is required.

      • Language: currently supports Mandarin Chinese, Sichuanese, Cantonese, and English
      • Input method model: suitable for longer sentence input. There are punctuation by default, and online semantics are not supported; after punctuation is enabled, local semantics are not supported.
      • Self-training platform model: Based on the input method model, you can upload thesaurus and sentence database to generate your own training model.
      • Online semantics: Online semantics only supports Mandarin (local semantics also only supports Mandarin). Online semantics performs structural analysis on the text of the recognition result to find the “keywords” of the sentence. Please refer to the “Semantic Understanding Protocol” document for online semantics detailed description.
      • Unit 2.0 Semantics: The function is similar to online semantics, but the parsing can be customized.
      • Supplement: PID=8001, self-training platform input method model; PID=8002, self-training platform search model.

    Cross-platform Android and IOS Baidu voice online recognition native plug-in

  • code example

    sealVoiceASRModule.recogOnlineStart(
        {
            // appId: '',
            // appKey: '',
            // appSecret: '',
            enableLongSpeech: true
        },
        ret => {
            const resultCode = ret.code;
            console.log('resultCode', resultCode);
            if (resultCode === 1000) {
                modal.toast({
                    message: `Recognizing online, starting to identify: ${resultCode}`,
                    duration: 3
                });
                this.recogOnlineBtn = 'Online recognition...';
            } else if (resultCode === 1001) {
                this.recogText += JSON.parse(ret.result).result + ' '
                // uni.showModal({
                // content: `Get the online recognition result (${resultCode}):` + ret.result
                // });
                // modal.toast({
                // message: 'Get the online recognition result:' + ret.result,
                //  duration: 3
                // })
            }
        }
    );
  • Interface return format

    {
        code: 1001,
        result: 'recognition result'
    }
  • Interface callback result status code description

    status code illustrate
    1000 recognition start
    1001 If the recognition is successful, return the recognition result and parse the recognition result format, refer to: https://cloud.baidu.com/doc/S…
    1002 end of identification
  • Plug-in method two:recogOnlineEnd, end recognition
  • You can also callrecogOnlineStartmethod, passing{ isFinish: true }parameter
  • code example

    // Call the recogOnlineStart interface, passing isFinish as true
    // sealVoiceASRModule.recogOnlineStart({ isFinish: true }, ret => {
    sealVoiceASRModule.recogOnlineEnd({}, ret => {
        const resultCode = ret.code;
        if (resultCode === 1002) {
            modal.toast({
                message: `End of identification, end identifier: ${resultCode}`,
                duration: 3
            });
            this.recogOnlineBtn = 'Start online recognition';
        }
    });

5. Follow-up plan

  • IOS support

6. List of system permissions that this plug-in needs to apply for

  • List of permissions that need to be applied for on the Android side

    • android.permission.READ_EXTERNAL_STORAGE read the contents of the SD card
    • android.permission.WRITE_EXTERNAL_STORAGE modify or delete the contents of the SD card
    • android.permission.INTERNET access network connection
    • android.permission.RECORD_AUDIO recording permission
  • Android side: plug-in function uses Baidu Open Platform Speech Recognition SDK, refer to its official website https://ai.baidu.com/tech/speech

Dear students, if you still have questions about the use of plug-ins, you can join the QQ group (170683293) for consultation.