Let the machine have temperature: two models for text sentiment analysis

Time:2021-4-26

Absrtact: from the perspective of model and algorithm, this blog introduces emotion analysis model based on statistical method and emotion analysis model based on deep learning.

Text sentiment analysis is a process of analyzing, processing and extracting subjective text with emotional color by using natural language processing and text mining technology. At present, the research of text sentiment analysis covers many fields including natural language processing, text mining, information retrieval, information extraction, machine learning and ontology, which has attracted the attention of many scholars and research institutions. In recent years, it continues to become one of the hot issues in the field of natural language processing and text mining.

In terms of subjective cognition, the task of affective analysis is to answer the following question: “who? At what time? For what? Which attribute? What kind of feelings are expressed? ” Therefore, a formal expression of sentiment analysis can be as follows: (entity, aspect, opinion, holder, time). For example, the following text “I think the 2.0T XX car is very powerful.” The formal tuple is (XX car, power, positive emotion, I, /). It should be noted that most of the current studies generally do not consider the opinion holder and time in the five elements of affective analysis.

Emotion analysis problems can be divided into many subdivided fields. The following mind map shows the subdivided tasks of emotion analysis tasks

Let the machine have temperature: two models for text sentiment analysis

The analysis objects of word level and sentence level are the positive and negative emotion of a word and the whole sentence respectively. They don’t distinguish the specific target in the sentence, such as entity or attribute, which is equivalent to ignoring the entity and attribute of the five elements. Word level sentiment analysis, namely the construction of sentiment dictionary, studies how to give emotional information to words. Sentence level / document level sentiment analysis studies how to label the whole sentence or document with sentiment. The goal level sentiment analysis considers the specific goal, which can be the entity, the attribute of an entity or the combination of entity and attribute. It can be divided into three types: target grounded aspect based sentimental analysis (tg-absa), target no aspect based sentimental analysis (tn-absa), and target aspect based sentimental analysis (t-absa); The object of tn-absa analysis is the positive and negative emotion of the entity in the text; The analysis object of t-absa is the combination of entity and attribute in text. The following table illustrates the affective analysis tasks of different goals:
Let the machine have temperature: two models for text sentiment analysis

Emotion analysis model based on statistical method

The emotion analysis method based on statistical method mainly depends on the established “emotion dictionary”. The establishment of “emotion dictionary” is the premise and foundation of emotion classification. At present, in practical use, it can be divided into four categories: general emotion words, degree adverbs, negative words and domain words. In English, it is mainly based on the expansion of the English Dictionary WordNet [1]. Hu and Liu [2] have established the seed adjective vocabulary by hand, and use the synonymous and near synonymous relationship between words in worldnet to judge the emotional tendency of emotional words, so as to judge the emotional polarity of opinions. In Chinese, it is mainly the expansion of HowNet [3]. Zhu Yanlan [4] uses the semantic similarity calculation method to calculate the semantic similarity between the word and the benchmark emotional word set, so as to infer the emotional tendency of the word. In addition, we can also establish a special domain dictionary to improve the accuracy of emotion classification, such as the establishment of a new network vocabulary dictionary to more accurately grasp the emotional tendency of new words.

Based on the method of emotion dictionary, the text is preprocessed by word segmentation and stop word processing, and then the constructed emotion dictionary is used to match the text string, so as to mine the positive and negative information. The general flow chart is as follows:

Let the machine have temperature: two models for text sentiment analysis

In addition to the above dictionaries, the following [5] supplements other existing Chinese dictionaries for reference:

Let the machine have temperature: two models for text sentiment analysis

Of course, we can also train our own emotional dictionary through corpus. After importing the sentiment dictionary, we need to use the sentiment dictionary text matching algorithm for sentiment analysis. Dictionary based text matching algorithm is relatively simple. The words in the sentence after word segmentation are traversed one by one. If the words hit the dictionary, the corresponding weight is processed. The weight of the positive word is addition, the weight of the negative word is subtraction, the weight of the negative word is the opposite number, and the weight of the degree adverb is multiplied by the weight of the word it modifies. Using the weight value of the final output, we can distinguish positive emotion, negative emotion or neutral emotion. A typical algorithm flow of sentiment analysis using sentiment dictionary text matching algorithm is as follows [5]:

Let the machine have temperature: two models for text sentiment analysis

The emotion analysis model based on statistical method is simple, universal and generalized, but there are still three main shortcomings

1. The accuracy is not high

Language is a highly complex thing, the use of simple linear superposition will obviously cause a great loss of accuracy. The weight of words is not invariable, and it is difficult to be accurate.

The dictionary needs to be updated continuously

For new awesome words, such as giving power, fucking great things, etc., dictionaries are not always covered. Therefore, we need to constantly refresh the dictionary to supplement new words. In the current era of the continuous emergence of online vocabulary, if the refresh speed of the dictionary can not keep up with the speed of the emergence of new words, then the actual use of emotional analysis will be far from the expected. For example, Taobao product evaluation, hungry takeout evaluation, if you can’t capture new words, then the analysis of emotion will deviate from reality.

3. It is difficult to build a dictionary

The core of emotion classification based on dictionary is emotion dictionary. The construction of affective dictionary needs strong background knowledge and deep understanding of language, which has great limitations in the analysis of foreign languages.

Emotion analysis model based on deep learning

After understanding the advantages and disadvantages of sentiment analysis model based on statistical method, let’s take a look at how deep learning text classification model performs text sentiment analysis and classification. One of the advantages of deep learning is that it can carry out end-to-end learning, and omit the manual intervention steps of each step in the middle. Based on the word vector generated by the pre training model, the first important problem that deep learning can solve is the construction of emotion dictionary. Next, we will take a typical text classification model as an example to show the evolution direction and application scenarios of the deep text classification model.

2.1 FastText[6]

Let the machine have temperature: two models for text sentiment analysis

Let the machine have temperature: two models for text sentiment analysis

Model operation steps:

Let the machine have temperature: two models for text sentiment analysis

2.2 TextCNN[7]

Let the machine have temperature: two models for text sentiment analysis

2.3 TextRNN[8]

Let the machine have temperature: two models for text sentiment analysis

Let the machine have temperature: two models for text sentiment analysis

Let the machine have temperature: two models for text sentiment analysis

2.4 TextRNN+Attention[9]

Let the machine have temperature: two models for text sentiment analysis

Let the machine have temperature: two models for text sentiment analysis

Han is hierarchical attention networks, which divides the text to be classified into a certain number of sentences, and performs encoder and attention operations at word level and sentence level respectively, so as to realize the classification of longer text. Compared with the above algorithm model, the structure of Han is slightly more complex, which can be decomposed into the following steps.

Let the machine have temperature: two models for text sentiment analysis

2.5 TextRCNN[10]

Let the machine have temperature: two models for text sentiment analysis

RCNNAlgorithm process:Firstly, bidirectional LSTM is used to learn the context of word, and forward and backward RNN is used to get the representation of forward and backward context of each word

Let the machine have temperature: two models for text sentiment analysis

The expression of a word becomes the form of connecting the word vector and the forward backward context vector

Let the machine have temperature: two models for text sentiment analysis

After that, the same convolution layer and pooling layer as textcnn can be connected. In the SEQ_ The length dimension can be classified by Max pooling, and then FC operation. The network can be regarded as an improved version of fasttext.

summary

From the perspective of model and algorithm, this blog introduces emotion analysis model based on statistical method and emotion analysis model based on deep learning. The emotion analysis model based on statistical method is simple and easy to use, but it has great defects in accuracy, flexibility and generalization; The direction of model evolution based on deep learning is to continuously capture context information through deeper and complex network, and train neural network with the help of word vector generated by powerful pre training model to complete this task. The following open source repository [13] details the pytorch implementation of each model and the comparison on the same Chinese baseline; The following two blog posts 11 also give a detailed introduction to other deep learning models of emotion analysis, which can be used as a guide for further exploration.

reference

[1]https://wordnet.princeton.edu/

[2]HU M,LIU B. Mining and summarizing customer reviews[C]. NY,USA:Proceedings of Knowledge Discoveryand Da-ta Mining,2004:168 - 177.

[3]https://languageresources.git…

%E9%87%91%E5%A4%A9%E5%8D%8E_Hownet/

[4] Zhu Yanlan, min Jin, Zhou Yaqian, et al. Lexical semantic tendency calculation based on how net 〔 J 〕. Acta Sinica Sinica Sinica, 2006, 20 (1): 14-20

[5]https://blog.csdn.net/weixin_…

details/93163519

[6]https://arxiv.org/abs/1612.03651

[7]https://arxiv.org/abs/1408.5882

[8]https://www.ijcai.org/Proceed…

[9]https://www.aclweb.org/anthol…

[10]http://zhengyima.com/my/pdfs/…

[11]https://zhuanlan.zhihu.com/p/…

[12]https://zhuanlan.zhihu.com/p/…

[13]https://github.com/649453932/…

This article is from the Huawei cloud community “NLP column: introduction to emotion analysis method”, the original author: quite suddenly.

Click follow to learn about Huawei’s new cloud technology for the first time~

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]