Turn your browser into a voice assistant like Siri


Recently, when browsing technical articles in my spare time, I saw an article about voice reading:Use JavaScript to make your browser speak, the article mentioned that you canspeechSynthesisLet the modern browser read the specified content by voice, which stimulated my curiosity to explore, so I have the following.

The execution of code snippets mentioned in this article requires the support of hardware devices such as audio output devices (such as audio and headphones) and audio input devices (such as microphones).

Speech synthesis

Strictly speaking, it is necessary to realize the function of voice reading aloudspeechSynthesisandSpeechSynthesisUtteranceThe two methods work together.SpeechSynthesisUtteranceTell the browser what needs to be read aloud, andspeechSynthesisThe content to be read out is synthesized into audio content and played by audio output devices such as audio.

Languages that support reading aloud

speechSynthesisThe implementation of is the voice reading through the bottom layer of the browser calling the relevant interfaces of the operating system. Therefore, the language support may vary depending on the browser and operating systemspeechSynthesis.getVoices()Gets the spoken language supported by the current device.

However, most support itspeechSynthesisMethod browsers generally support reading Chinese content. And this also brings a benefit: it can be used offline or throughSpeechSynthesisVoice.localServiceMethod is replaced with its own sound source.

Code example

Phonetic reading

    // Phonetic reading功能
    function speak(sentence) {
      // 生成需要Phonetic reading的内容
      const utterance = new SpeechSynthesisUtterance(sentence)
      // 由浏览器发起Phonetic reading的请求



Excluding IE browsers that are no longer maintained, several mainstream PC browsers and IOS have been supported. Android support has good or bad, and compatibility needs to be done well.

M71 proposal

However, it is worth noting that after chrome launched relevant functions, it was found that the function of voice reading was abused by some websites, so chrome proposed in M71(Proposal link)After that, the trigger mechanism is changed to: the user needs to trigger the event to read aloud.

I personally tested chrome and edge browsers. Chrome cannot be triggered by directly calling and creating DOM nodesclickEvent is called indirectly, while edge can call both methods when writing a document (2021-03-13); Therefore, if there are relevant business requirements, it is recommended to make corresponding compatibility preparations.

Phonetic reading

    function speak(sentence) {
      const utterance = new SpeechSynthesisUtterance(sentence);
    //After the M71 proposal, chrome banned the mechanism of automatically calling language reading
    //Edge can be called directly on March 13, 2021. The follow-up degree of other browsers is unknown
    Speak ('direct call ');
    const button = document.createElement('button');
    Button.onclick = () = > speak ('create node call ');
    setTimeout(() => document.body.removeChild(button), 50);

After testing these codes, an idea suddenly flashed through my mind: since there are voice reading, is there a method of voice recognition? So I checked the MDN and some related data and found that there are really speech recognition methods:SpeechRecognitionDocument link)。

Speech recognition

Read aloud with voicespeechSynthesisLocal reading is different,SpeechRecognitionIn MDN document(Click here )It is clearly proposed that it is server based speech recognition, that is, it must be connected to the Internet to recognize.

On some browsers, like Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won’t work offline.

On some browsers (such as chrome), the speech recognition used on Web pages is a server based recognition engine. Your audio will be sent to the network service for identification processing, so it will not work offline.

If your browser is chrome, the voice recognition server is provided by Google. If you don’t use a ladder, it will directly prompt the end. But the good thing is that it doesSpeechRecognition.serviceURIThe provider used to customize speech recognition is an expedient.

Code example

Phonetic reading

  Click to recognize speech
  End speech recognition
    //At present, only chrome and edge support this feature, which needs to be added with privatization prefix
    const SpeechRecognition = window.webkitSpeechRecognition
    const recognition = new SpeechRecognition()
    const output = document.getElementById("output")
    const status = document.getElementById("status")

    //Voice recognition start hook
    recognition.onstart = function() {
      output.innerText = ''
      Status.innertext = 'speech recognition started'
    //If there is no sound, the hook ends
    recognition.onspeechend = function() {
      Status.innertext = "speech recognition end"
    //Identify the wrong hook
    recognition.onerror = function({ error }) {
      const errorMessage = {
        'not speech': 'no sound source detected',
        'not allowed': 'microphone device not detected or browser not allowed to use microphone'
      Status.innertext = ErrorMessage [error] | 'speech recognition error'
    //The hook that identifies the result,
    //You can control whether to recognize in real time through intermimresults. Maxalternatives sets the number of returned recognition results
    recognition.onresult = function({ results }) {
      const { transcript, confidence } = results[0][0]
      Output. InnerText = ` recognized content: ${transcript}, recognition rate: ${(confidence * 100). ToFixed (2)}%`