Sentiment analysis or opinion mining is an application of text analysis, which is used to identify and extract subjective information from source data.
The basic task of affective analysis is to classify the views expressed in documents, sentences or entity features as positive or negative. This tutorial introduces the use of emotion analysis in rapidminer. The examples provided here give a list of movies and their reviews, such as“ “Front” Or“ “Negative”。 The program implements the precision and recall method. accuracy Is the probability of (randomly selected) retrieving documents. recall Is the probability of retrieving (randomly selected) relevant documents in the search. highrecall This means that the algorithm returns most relevant results.accuracyhigh Indicates that the algorithm returns more relevant results than irrelevant results.
First, make positive and negative comments on a film. Then, words are stored in different polarities (positive and negative). Both the vector word list and the model have been created. Then, take the desired movie list as input. The model compares each word in a given movie list with previously stored words with different polarities. Movie reviews are estimated based on most words that appear under polarity. For example, when viewing Django unchained, the comments are compared with the vector word list created at the beginning. The most words are positive. So the result is positive. The same is true of negative results.
The first step in this analysis is to process the document from the data, that is, extract the positive and negative comments of the film and store them in different polarities. The model is shown in Figure 1.
Under process document, click edit list on the right. Load positive and negative comments under different class names “positive” and “negative”.
Under the process document operator, nested operations occur, such as marking words, filtering stop words.
Then use two operators, such as store and validation operators, as shown in Figure 1. The store operator is used to output word vectors to files and directories of our choice. Validation operator (cross validation) is a standard method to evaluate the accuracy and effectiveness of statistical models. Our data set is divided into two parts, a training set and a test set. Train the model only on the training set and evaluate the accuracy of the model on the test set. Repeat n times. Double click the validation operator. There will be two panels – training and testing. Under the “training” panel, linear support vector machine (SVM) is used, which is a popular classifier set, because the function is a linear combination of all input variables. To test the model, we use the “application model” operator to apply the training set to our test set. To measure the accuracy of the model, we use the “performance” operator.
Then run the model. The results of class recall% and precision% are shown in Figure 5. The model and vector word lists are stored in the repository.
The model and vector word lists are then retrieved from the repository previously stored. Then connect from the retrieved word list to the process document operator shown in Figure 6.
Then click the process document operator, and then click the edit list on the right. This time, I added a list of 5 movie reviews from the website and stored them in the directory. Assign an unmarked name to the class name, as shown in Figure 7.
The apply model operator takes a model from the retrieve operator, takes unmarked data from the process document as input, and then outputs the applied model to the lab port, so it is connected to the res port. The results are as follows. When viewing les miserables, 86.4% thought it was positive, while 13.6% thought it was negative, because the matching degree between comments and positive thesaurus was higher than negative.