Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.

Time:2019-12-18

On November 5, at the “wave summit +” 2019 deep learning developers autumn summit, baidu released the semantic understanding development kit based on Ernie, aiming to provide enterprise level developers with more advanced, efficient and easy-to-use Ernie application services, and fully release the industrial value of Ernie, including Ernie lightweight solutions, with a speed up of 1000 times!
Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.

In July this year, baidu released Ernie 2.0, a semantic understanding framework for continuous learning, which surpassed Bert and xlnet in a total of 16 Chinese and English tasks and achieved the effect of SOTA.

Since the release of Ernie 2.0, the industrial application process of Ernie has been accelerating, the usability has been improving, and the supporting products have been enriched and improved. At present, ernie2.0 has been widely used in Baidu and the industry, and has made significant improvement in a variety of scenarios. The successful application of these scenarios has accumulated rich experience for the industrial application of Ernie.

Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.
The picture above shows the panorama of Ernie, and preset a series of pre training models including Ernie general model, Ernie task model, Ernie domain model and Ernie tiny lightweight model released this time. On this basis, a development kit of semantic understanding of the propeller is built, which includes tools and platforms. It covers training, optimization, deployment and other development processes in an all-round way, and has five characteristics, including lightweight scheme, comprehensive ability, rapid prediction, flexible deployment and platform enabling. Next, we will uncover the secrets one by one.

Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.

Feature 1: lightweight solution, 1000 times faster forecast

Ernie 2.0 has a strong semantic understanding ability, and these abilities need strong computing power to give full play, which brings great challenges for practical application. To this end, baidu released Ernie tiny, a lightweight pre training model, and Ernie slim, a one touch data distillation tool, with a prediction speed of 1000 times.

Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.
Ernie tiny technology principle

Ernie tiny mainly compresses Ernie 2.0 base model by means of model structure compression and model distillation. Its characteristics and advantages mainly include the following four aspects:

Shallow: the model adopts a three-layer transformer structure, with a linear speed increase of 4 times;
• width: the model widens the hidden layer parameter from 768 of Ernie 2.0 to 1024, and the increase of width will improve the effect. Depending on the general matrix operation optimization of the flyer, the “Widening” does not lead to the linear decrease of the speed;
Short: in order to shorten the sequence length of the input text and reduce the computational complexity, the Chinese subword granularity input is used for the first time in the model, with an average length reduction of 40%;
• extract: Ernie tiny plays a student role in training, and uses model distillation to learn the distribution and output of the corresponding layer of teacher model Ernie 2.0 in transformer layer and prediction layer.

Through the above four aspects of compression, the effect of Ernie tiny model is only 2.37% lower than Ernie 2.0 base on average, but 8.35% higher than “SOTA before Bert” and 4.3 times faster.

Ernie tiny’s prediction speed is not enough in some performance demanding scenarios, in which the delay response is often required to be less than 1ms. For this reason, the suite provides Ernie slim tool for one key data distillation. The tool uses data as a bridge to transfer Ernie’s knowledge to a small model, which can improve the prediction speed thousands of times when the effect loss is very small.

Simultaneous interpreting of ERNIE Slim is slightly different from traditional deep learning. First, we need to use Ernie 2.0 model to finish tune the input annotation data pair to get the teacher model, then use the teacher model to predict the non annotation data. In this step, we can use three strategies: adding noise words, replacing words of the same part of speech, and n-sampling to enhance the data. Finally, we train the model with small computational complexity, such as bow and CNN.

The following table shows the effect of Ernie slim. It can be seen from the table that compared with Ernie 2.0 base model, the effect loss of the small model after data distillation is not big, and the predicted speed increases by more than a thousand times; compared with the simple model, when the speed is close, the effect will be significantly improved.

Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.

Feature 2: one click high performance all kinds of fine tuning tools

Ernie fine tune tools are designed to provide developers with a simple and easy-to-use fine tune framework. Currently, they cover four common tasks of NLP: single sentence classification, sentence pair matching, named entity recognition, and reading comprehension. The toolset supports multi machine and multi card fine tune, and uses fp16 tensor core technology to obtain 60% training speed improvement on Tesla V series GPU.

The fine tune tool includes a propeller based training organization framework, which can help developers to carry out model management, parameter hot start, automatic multi card parallel and other work, so that developers can focus more on the construction of network structure and input data pipeline.

Feature 3: fast prediction API

Ernie fast information API is designed to solve the delay sensitive scenarios of product applications, and provide enterprise developers with fast prediction C + + API, which is convenient for developers to integrate. The tool also makes full use of the high-speed prediction advantage of the latest version of the flyer. The op aggregation algorithm of the flyer 1.6 effectively accelerates the prediction of Ernie.

In the delay sensitive scenario, the latency of Ernie fast information API on CPU (Intel Xeon gold 6148 CPU) devices is reduced by 60% compared with that of competitors in GPU (P4) devices by 21%.

Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.

Feature 4: vector server, supporting flexible deployment across platforms

In order to further reduce the developer’s use cost, the suite provides Ernie service, a prediction service scheme, to facilitate obtaining the vector distribution and prediction scoring of Ernie model.
Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.

Ernie service architecture

Ernie service is a multi GPU prediction service built on python. Requests sent by client will be automatically distributed to GPU to execute Ernie fast information API to obtain Ernie vector and score. At present, Ernie service supports flexible invocation of different platforms, devices and languages, with high prediction performance, which is 13% higher than that of the competitive Bert as service in QPS.

Feature 5: platform enabling

In addition, the suite also includes Ernie’s platform application scheme. Developers can complete the whole process functions of NLP tasks, such as data annotation, processing, Ernie fine-tuning, optimization, deployment, etc. through easydl professional edition, providing rich algorithm and computing services for developers, and further reducing the industrial application threshold of Ernie. The platform presets classic networks such as NLP text classification and text matching, which can quickly meet the needs of multi-level developers.

In a word, Ernie’s semantic understanding development kit relies on Baidu’s leading advantages in natural language processing technology such as pre training model and flying oar platform, contributes to the industrialization of artificial intelligence and empowers all walks of life.

Related links:
• Ernie industrial open source tools:
https://github.com/PaddlePadd…
• Ernie platform services:
https://ai.baidu.com/easydl/pro

Focus!

On November 23, Ernie’s tour salon will be added in Shanghai, full of dry goods, industry class a tutors, and a group of like-minded small partners. What are you waiting for? Interested developers click the link below to sign up!
Registration address: https://iwenjuan.baidu.com/? C
Scan code concerns Baidu NLP official public number and gets first-hand information of Baidu NLP technology.
Join Ernie’s official technical exchange group (760439550), and Baidu engineers will answer your questions in real time!
Go to GitHub (GitHub. COM / paddlepaddle / Ernie) to light up the star for Ernie. Learn and use it now!
Speed increase 1000 times, prediction delay less than 1ms, Baidu PaddlePaddle released ERNIE based semantic understanding development kit.