Text sentiment analysis is a process of analyzing, processing and extracting subjective text with emotional color by using natural language processing and text mining technology. At present, the research of text sentiment analysis covers many fields including natural language processing, text mining, information retrieval, information extraction, machine learning and ontology, which has attracted the attention of many scholars and research institutions. In recent years, it continues to become one of the hot issues in the field of natural language processing and text mining.
In terms of subjective cognition, the task of affective analysis is to answer the following question: “who? At what time? For what? Which attribute? What kind of feelings are expressed? ” Therefore, a formal expression of sentiment analysis can be as follows: (entity, aspect, opinion, holder, time). For example, the following text “I think the 2.0T XX car is very powerful.” The formal tuple is (XX car, power, positive emotion, I, /). It should be noted that most of the current studies generally do not consider the opinion holder and time in the five elements of affective analysis.
Emotion analysis problems can be divided into many subdivided fields. The following mind map shows the subdivided tasks of emotion analysis tasks
The analysis objects of word level and sentence level are the positive and negative emotion of a word and the whole sentence respectively. They don’t distinguish the specific target in the sentence, such as entity or attribute, which is equivalent to ignoring the entity and attribute of the five elements. Word level sentiment analysis, namely the construction of sentiment dictionary, studies how to give emotional information to words. Sentence level / document level sentiment analysis studies how to label the whole sentence or document with sentiment. The goal level sentiment analysis considers the specific goal, which can be the entity, the attribute of an entity or the combination of entity and attribute. It can be divided into three types: target grounded aspect based sentimental analysis (tg-absa), target no aspect based sentimental analysis (tn-absa), and target aspect based sentimental analysis (t-absa); The object of tn-absa analysis is the positive and negative emotion of the entity in the text; The analysis object of t-absa is the combination of entity and attribute in text. The following table illustrates the affective analysis tasks of different goals:
Emotion analysis model based on statistical method
The emotion analysis method based on statistical method mainly depends on the established “emotion dictionary”. The establishment of “emotion dictionary” is the premise and foundation of emotion classification. At present, in practical use, it can be divided into four categories: general emotion words, degree adverbs, negative words and domain words. In English, it is mainly based on the expansion of the English Dictionary WordNet . Hu and Liu  have established the seed adjective vocabulary by hand, and use the synonymous and near synonymous relationship between words in worldnet to judge the emotional tendency of emotional words, so as to judge the emotional polarity of opinions. In Chinese, it is mainly the expansion of HowNet . Zhu Yanlan  uses the semantic similarity calculation method to calculate the semantic similarity between the word and the benchmark emotional word set, so as to infer the emotional tendency of the word. In addition, we can also establish a special domain dictionary to improve the accuracy of emotion classification, such as the establishment of a new network vocabulary dictionary to more accurately grasp the emotional tendency of new words.
Based on the method of emotion dictionary, the text is preprocessed by word segmentation and stop word processing, and then the constructed emotion dictionary is used to match the text string, so as to mine the positive and negative information. The general flow chart is as follows:
In addition to the above dictionaries, the following  supplements other existing Chinese dictionaries for reference:
Of course, we can also train our own emotional dictionary through corpus. After importing the sentiment dictionary, we need to use the sentiment dictionary text matching algorithm for sentiment analysis. Dictionary based text matching algorithm is relatively simple. The words in the sentence after word segmentation are traversed one by one. If the words hit the dictionary, the corresponding weight is processed. The weight of the positive word is addition, the weight of the negative word is subtraction, the weight of the negative word is the opposite number, and the weight of the degree adverb is multiplied by the weight of the word it modifies. Using the weight value of the final output, we can distinguish positive emotion, negative emotion or neutral emotion. A typical algorithm flow of sentiment analysis using sentiment dictionary text matching algorithm is as follows :
The emotion analysis model based on statistical method is simple, universal and generalized, but there are still three main shortcomings
1. The accuracy is not high
Language is a highly complex thing, the use of simple linear superposition will obviously cause a great loss of accuracy. The weight of words is not invariable, and it is difficult to be accurate.
The dictionary needs to be updated continuously
For new awesome words, such as giving power, fucking great things, etc., dictionaries are not always covered. Therefore, we need to constantly refresh the dictionary to supplement new words. In the current era of the continuous emergence of online vocabulary, if the refresh speed of the dictionary can not keep up with the speed of the emergence of new words, then the actual use of emotional analysis will be far from the expected. For example, Taobao product evaluation, hungry takeout evaluation, if you can’t capture new words, then the analysis of emotion will deviate from reality.
3. It is difficult to build a dictionary
The core of emotion classification based on dictionary is emotion dictionary. The construction of affective dictionary needs strong background knowledge and deep understanding of language, which has great limitations in the analysis of foreign languages.
Emotion analysis model based on deep learning
After understanding the advantages and disadvantages of sentiment analysis model based on statistical method, let’s take a look at how deep learning text classification model performs text sentiment analysis and classification. One of the advantages of deep learning is that it can carry out end-to-end learning, and omit the manual intervention steps of each step in the middle. Based on the word vector generated by the pre training model, the first important problem that deep learning can solve is the construction of emotion dictionary. Next, we will take a typical text classification model as an example to show the evolution direction and application scenarios of the deep text classification model.
Model operation steps:
Han is hierarchical attention networks, which divides the text to be classified into a certain number of sentences, and performs encoder and attention operations at word level and sentence level respectively, so as to realize the classification of longer text. Compared with the above algorithm model, the structure of Han is slightly more complex, which can be decomposed into the following steps.
RCNNAlgorithm process:Firstly, bidirectional LSTM is used to learn the context of word, and forward and backward RNN is used to get the representation of forward and backward context of each word
The expression of a word becomes the form of connecting the word vector and the forward backward context vector
After that, the same convolution layer and pooling layer as textcnn can be connected. In the SEQ_ The length dimension can be classified by Max pooling, and then FC operation. The network can be regarded as an improved version of fasttext.
From the perspective of model and algorithm, this blog introduces emotion analysis model based on statistical method and emotion analysis model based on deep learning. The emotion analysis model based on statistical method is simple and easy to use, but it has great defects in accuracy, flexibility and generalization; The direction of model evolution based on deep learning is to continuously capture context information through deeper and complex network, and train neural network with the help of word vector generated by powerful pre training model to complete this task. The following open source repository  details the pytorch implementation of each model and the comparison on the same Chinese baseline; The following two blog posts 11 also give a detailed introduction to other deep learning models of emotion analysis, which can be used as a guide for further exploration.
［2］HU M，LIU B． Mining and summarizing customer reviews［C］． NY，USA:Proceedings of Knowledge Discoveryand Da-ta Mining，2004:168 － 177．
 Zhu Yanlan, min Jin, Zhou Yaqian, et al. Lexical semantic tendency calculation based on how net 〔 J 〕. Acta Sinica Sinica Sinica, 2006, 20 (1): 14-20
This article is from the Huawei cloud community “NLP column: introduction to emotion analysis method”, the original author: quite suddenly.
Click follow to learn about Huawei’s new cloud technology for the first time~