Few shot learning for named entity recognition in medical text

Time:2020-9-26

1. Summary

In this paper, named entity recognition is studied for some data sets of electronic health records. On the basis of other related datasets, only 10 samples are collected from target dataset for fast shot learning. Five methods to improve performance are proposed
(1)layer-wise initialization with pre-trained weights
(2)hyperparameter tuning
(3)combining pre-training data
(4)custom word embeddings
(5)optimizing out-of-vocabulary (OOV) words

2. Content

The main data set of this paper is NL + 2003.
Few shot learning for named entity recognition in medical text


The baseline model used in this paper is blstm-cnns proposed by J. Chiu et al. The highlight of this model is the combination of character, word and casting embedding. Casting embedding mainly includes numerical, alllower, allupper and mainly_ numeric (more than 50% of characters of a word are numeric), initialUpper, contains_ digit, padding and other。


Five tricks to improve performance are as follows:
(1) Single pre training: use other single data sets to pre train, and set the contrast experiment: all layers use pre training weight, only use blstm, all layers except blstm, do not use pre training weight.
(2) Hyperparameter tuning: includes optimizers, pre training dataset, SGD learning rate, batch normalization (whether used), word embedding (trainable) and learning rate reason (constant or time scheduled).
(3) Combined pre training: using multiple data sets in series to train the model, and loading weights in the target data set training.
(4) Customized word embedding: whether word embedding uses glove or retraining fasttext on a medical dataset.
(5)Optimizing OOV words:Remove trailing “:”, “;”, “.” and “-”、Remove quotations、Remove leading “+”


The results of the five optimization methods are as follows:
(1) Single pre training: F1 score increased by + 4.52%.
(2) Hyperparameter tuning: optimizer selection is the most important (Nadam > > SGD), and the second important is the selection of pre training dataset (+ 2.34%).
(3) Combined pre training: the negative effect was – 1.85%.
(4) Customization word embedding: self training word embedding increased by + 3.78%.
(5) Optimizing OOV words: increased by + 0.87%.

Recommended Today

Java security framework

The article is mainly divided into three parts1. The architecture and core components of spring security are as follows: (1) authentication; (2) authority interception; (3) database management; (4) authority caching; (5) custom decision making; and;2. To build and use the environment, the current popular spring boot is used to build the environment, and the actual […]