Let the machine have temperature: take you to understand the two models of text emotion analysis

Time:2021-8-20

Abstract: from the perspective of model and algorithm, this blog introduces the emotion analysis model based on statistical method and the emotion analysis model based on deep learning.

Text sentiment analysis refers to the process of analyzing, processing and extracting subjective text with emotional color by using natural language processing and text mining technology. At present, the research of text emotion analysis covers many fields, including natural language processing, text mining, information retrieval, information extraction, machine learning and ontology. It has attracted the attention of many scholars and research institutions. In recent years, it has continued to become one of the hot issues in the field of natural language processing and text mining.

In terms of people’s subjective cognition, the task of emotion analysis is to answer the following question “who is it? At what time? Yes, what? Which attribute? What emotion did you express? ” Therefore, a formal expression of emotion analysis can be as follows: (entity, aspect, opinion, holder, time). For example, the following text “I think the power of 2.0T XX car is very surging.” The formal tuple is (XX car, power, positive emotion, I, /). It should be noted that most of the current studies generally do not consider the viewpoint holders and time in the five elements of affective analysis.

Emotion analysis problems can be divided into many subdivided fields. The following mind map shows the subdivided tasks of emotion analysis tasks:

Let the machine have temperature: take you to understand the two models of text emotion analysis

The analysis objects of word level and sentence level are the emotional positive and negative directions of a word and the whole sentence respectively. They do not distinguish the specific objectives in the sentence, such as entities or attributes, which is equivalent to ignoring the entities and attributes in the five elements. Word level emotion analysis, that is, the construction of emotion dictionary, studies how to give emotional information to words. Sentence level / document level affective analysis studies how to label the whole sentence or document. Goal level emotion analysis considers specific goals, which can be entities, attributes of an entity or a combination of entities and attributes. It can be divided into three types: target grounded aspect based sentimental analysis (tg-absa), target no aspect based sentimental analysis (tn-absa), target aspect based sentimental analysis (t-absa). The analysis object of tg-absa is the emotional analysis of each attribute under the given attribute set of an entity; The analysis object of tn-absa is the positive and negative emotion of the entity in the text; The analysis object of t-absa is the combination of entities and attributes that appear in the text. The following table illustrates the emotional analysis tasks for different goals:
Let the machine have temperature: take you to understand the two models of text emotion analysis

Emotion analysis model based on statistical method

The emotion analysis method based on statistical method mainly depends on the established “emotion dictionary”. The establishment of “emotion dictionary” is the premise and basis of emotion classification. At present, in practical use, it can be divided into four categories: general emotion words, degree adverbs, negative words and domain words. In English, it is mainly based on the expansion of the English Dictionary WordNet [1]. Hu and Liu [2] have manually established the seed adjective vocabulary, and use the synonymous and synonymous relationship between words in worldnet to judge the emotional tendency of emotional words, so as to judge the emotional polarity of views. In Chinese, it is mainly the expansion of HowNet [3]. Zhu Yanlan [4] uses the semantic similarity calculation method to calculate the semantic similarity between the word and the benchmark emotional word set, so as to infer the emotional tendency of the word. In addition, we can also establish a special domain dictionary to improve the accuracy of emotion classification, such as establishing a new network vocabulary dictionary to more accurately grasp the emotional tendency of new words.

Based on the method of emotional dictionary, the text is preprocessed by word segmentation and stop word processing, and then the constructed emotional dictionary is used to match the text string, so as to mine the positive and negative information. The general process is shown in the figure:

Let the machine have temperature: take you to understand the two models of text emotion analysis

In addition to the above dictionaries, the following [5] supplements other existing Chinese dictionaries for reference:

Let the machine have temperature: take you to understand the two models of text emotion analysis

Of course, you can also train your own emotional dictionary through corpus. After importing the emotion dictionary, we need to use the emotion dictionary text matching algorithm for emotion analysis. The text matching algorithm based on dictionary is relatively simple. Traverse the words in the sentence after word segmentation one by one. If the words hit the dictionary, the corresponding weight will be processed. The weight of positive words is addition, the weight of negative words is subtraction, the weight of negative words is the opposite number, and the weight of degree adverbs is multiplied by the weight of words it modifies. Using the weight value of the final output, we can distinguish whether it is positive, negative or neutral emotion. A typical algorithm flow of emotion analysis using emotion dictionary text matching algorithm is as follows [5]:

Let the machine have temperature: take you to understand the two models of text emotion analysis

The emotion analysis model based on statistical method is simple, universal and generalized, but there are still three main shortcomings:

1. The accuracy is not high

Language is a highly complex thing. Using simple linear superposition will obviously cause a great loss of accuracy. Word weight is not invariable, and it is difficult to be accurate.

2 the dictionary needs to be continuously updated

For new awesome words, such as giving power, fucking great things, etc., dictionaries are not always covered. Therefore, it is necessary to constantly refresh the dictionary to supplement new words. In the current era of the continuous emergence of online vocabulary, if the refresh speed of the dictionary can not keep up with the emergence of new words, the actual use of emotion analysis will be far from expected. For example, Taobao commodity evaluation, hungry takeout evaluation, etc. if you can’t capture new words, the emotion of analysis will deviate from reality.

3 difficulties in building Dictionaries

The core of emotion classification based on dictionary lies in emotion dictionary. The construction of affective dictionary needs strong background knowledge and profound understanding of language, which will be very limited in the analysis of foreign languages.

Affective analysis model based on deep learning

After understanding the advantages and disadvantages of emotion analysis model based on statistical method, let’s take a look at how the deep learning text classification model carries out text emotion analysis and classification. One advantage of deep learning is that it can carry out end-to-end learning and omit the manual intervention steps of each step in the middle. Based on the word vector generated by the pre training model, the first important problem that deep learning can solve is the construction of emotion dictionary. Next, we will take the typical text classification model as an example to show the evolution direction and application scenarios of the deep text classification model.

2.1 FastText[6]

Let the machine have temperature: take you to understand the two models of text emotion analysis

Let the machine have temperature: take you to understand the two models of text emotion analysis

Model running steps:

Let the machine have temperature: take you to understand the two models of text emotion analysis

2.2 TextCNN[7]

Let the machine have temperature: take you to understand the two models of text emotion analysis

2.3 TextRNN[8]

Let the machine have temperature: take you to understand the two models of text emotion analysis

Let the machine have temperature: take you to understand the two models of text emotion analysis

Let the machine have temperature: take you to understand the two models of text emotion analysis

2.4 TextRNN+Attention[9]

Let the machine have temperature: take you to understand the two models of text emotion analysis

Let the machine have temperature: take you to understand the two models of text emotion analysis

Han is hierarchical attention networks, which divides the text to be classified into a certain number of sentences, and performs encoder and attention operations at word level and sense level respectively, so as to realize the classification of longer text. Compared with the above algorithm model, the structure of Han is slightly more complex, which can be divided into the following steps.

Let the machine have temperature: take you to understand the two models of text emotion analysis

2.5 TextRCNN[10]

Let the machine have temperature: take you to understand the two models of text emotion analysis

RCNNAlgorithm process:Firstly, the two-way LSTM is used to learn the word context, and the forward and backward context representation of each word is obtained by using the forward and backward RNN:

Let the machine have temperature: take you to understand the two models of text emotion analysis

The representation of words becomes the form of connecting word vectors and forward and backward context vectors:

Let the machine have temperature: take you to understand the two models of text emotion analysis

Then connect the same convolution layer and pooling layer as textcnn. In SEQ_ The length dimension can be classified by Max pooling and then FC. The network can be regarded as an improved version of fasttext.

summary

From the perspective of model and algorithm, this blog introduces the emotion analysis model based on statistical method and the emotion analysis model based on deep learning. The emotion analysis model based on statistical method is simple and easy to use, but it has great defects in accuracy, flexibility and generalization; The evolution direction of the model based on deep learning is to capture the context information through deeper and complex networks, and train the neural network with the help of the word vector generated by the powerful pre training model to complete this task. The following open source repository [13] describes in detail the pytorch implementation of each model and the comparison on the same Chinese baseline; The following two blogs 11 also introduce other deep learning models of emotion analysis in detail, which can be used as a guide for further exploration.

reference

[1]https://wordnet.princeton.edu/

[2]HU M,LIU B. Mining and summarizing customer reviews[C]. NY,USA:Proceedings of Knowledge Discoveryand Da-ta Mining,2004:168 - 177.

[3]https://languageresources.git…

%E9%87%91%E5%A4%A9%E5%8D%8E_Hownet/

Zhu Yanlan, min Jin, Zhou Yaqian, et al. Calculation of lexical semantic tendency based on how net [ J ]. Chinese Journal of information technology, 2006, 20 (1): 14-20

[5]https://blog.csdn.net/weixin_…

details/93163519

[6]https://arxiv.org/abs/1612.03651

[7]https://arxiv.org/abs/1408.5882

[8]https://www.ijcai.org/Proceed…

[9]https://www.aclweb.org/anthol…

[10]http://zhengyima.com/my/pdfs/…

[11]https://zhuanlan.zhihu.com/p/…

[12]https://zhuanlan.zhihu.com/p/…

[13]https://github.com/649453932/…

This article is shared from the NLP column – Introduction to emotion analysis methods in Huawei cloud community. The original author: it’s quite sudden.

Click focus to learn about Huawei cloud’s new technologies for the first time~