Pure pytorch voice toolkit voicebrain open source, Kaldi: “I’m under a bit of pressure”


From: the heart of the machine

[introduction]: more than a year after mirco ravanelli announced the creation of a new voice toolkit, speechbrain really arrived on schedule.

Pure pytorch voice toolkit voicebrain open source, Kaldi:

The progress of speech processing technology is an important part of AI to change people’s life. The rise of deep learning technology has also made great progress in this field in recent years. In the past, the main method in this field was to develop different toolkits for different tasks. For users, learning various toolkits required a lot of time. It may also involve learning different programming languages and being familiar with different code styles and standards. Now, most of these tasks can be realized by deep learning technology.

Previously, the voice tools commonly used by developers include Kaldi, espnet, CMU Sphinx, HTK, etc., each of which has its own shortcomings. Taking Kaldi as an example, it relies on a large number of scripting languages, and the core algorithm is written in C + +. In addition, it may need to change the structure of various neural networks. Even experienced engineers will experience great pain during commissioning.

Adhering to the principle of making voice developers easier, mirco ravanelli, a member of yoshua bengio team, once developed an open-source framework pytorch Kaldi, which tries to inherit Kaldi’s efficiency and pytorch’s flexibility, but according to the development members themselves, “it is not perfect enough”.

So, more than a year ago, mirco ravanelli announced that it would build a new integrated voice toolkit, speechbrain. In view of the above background, the main purpose of the birth of speechbrain is: simple enough, flexible enough and user-friendly.

Pure pytorch voice toolkit voicebrain open source, Kaldi:

Project address:


As an open source integrated speech toolkit based on pytorch, speechbrain can be used to develop the latest speech technologies, including speech recognition, speaker recognition, speech enhancement, multi microphone signal processing and speech recognition system, and has excellent performance. The team summarized its characteristics as “easy to use”, “easy to customize”, “flexible” and “modular”.

For machine learning researchers, speech brain can be easily embedded into other models to promote the research of speech technology; For beginners, speechbrain is not difficult to master. According to the test, it only takes a few hours for ordinary developers to be familiar with the use of the toolkit. In addition, the development team also released many tutorials for reference(https://speechbrain.github.io…\_basics.html)。

In general, speechbrain has the following highlights:

  • The development team integrates some pre training models with huggingface, which have interfaces for running reasoning. If the hugging face model is not available, the team will provide a Google drive folder containing all the corresponding experimental results;
  • Use pytorch data parallel or distributed data parallel for multi GPU training and reasoning;
  • Mix precision, speed up training speed;
  • Transparent and fully customizable data input and output pipeline. Speechbrain follows the pytorch data loader and dataset style, enabling users to customize the I / O pipeline.

express setup

Currently, developers can install speechbrain through pypi. In addition, they can use local installation to run experiments and modify / customize toolkits.

Speechbrain supports Linux based distributions and MacOS (and provides corresponding solutions for Windows users:https://github.com/speechbrai…)。

Speechbrain supports CPU and GPU, but for most recipes, GPU must be used during training. It should be noted that CUDA must be installed correctly to use GPU.

Installation tutorial address:


Installation via pypi

After creating a python environment, just enter the following:

pip install speechbrain

You can then access speechbrain using the following command:

import  speech  brain  as  sb

Local installation

After creating a python environment, just enter the following:

git clone https://github.com/speechbrain/speechbrain.gitcd speechbrainpip install -r requirements.txtpip install --editable .

You can then access speechbrain in the following ways:

import  speechbrain  as  sb

Any changes to the voicetrain package will be automatically interpreted when the package with the — editable flag is installed.

Speechbrain is not affiliated to any organization, and the team members are from Mila Institute, nuance, Dolby Laboratories, NVIDIA, Samsung, viadialog and other laboratories and enterprises. The first two leaders were mirco ravanelli, a postdoctoral student of Mila Institute, and titouan parcollet, a doctoral student of Avignon. At present, the speechtrain project is still under improvement, and more developers are welcome to join.

Seeing this, does Kaldi feel a little bit stressed?

Open source outpostShare popular, interesting and practical open source projects on a daily basis. Participate in the maintenance of 100000 + star open source technology resource library, including python, Java, C / C + +, go, JS, CSS, node.js, PHP,. Net, etc.