Introduction:Intelligent customer service not only solves customers’ high-frequency business problems, but also needs to provide customers with multi-dimensional, human like assistant, shopping guide, chat and entertainment services, so as to improve customers’ overall satisfaction with intelligent customer service robot. In this process, emotion analysis technology plays an important role in the capacity building of robot human. This paper will summarize and introduce the application scenarios of emotion analysis technology in intelligent customer service system from five dimensions, including the principle of emotion analysis algorithm model and the actual landing mode and effect analysis.
By song Shuangyong, Wang Chao and Chen Haiqing
Human computer dialogue has always been one of the important research directions in the field of natural language processing. In recent years, with the progress of human-computer interaction technology, dialogue system is gradually moving towards practical application. Among them, intelligent customer service system is widely concerned by many enterprises, especially large and medium-sized enterprises. Intelligent customer service system aims to solve the situation that traditional customer service mode needs a lot of manpower. While saving manpower, it enables the artificial customer service to provide higher quality service for special problems or special users, so as to realize the overall improvement of “intelligent customer service + artificial customer service” in two dimensions of service efficiency and service quality. In recent years, many large and medium-sized companies have built their own intelligent customer service system, such as FRAP of Fujitsu, Jimi of Jingdong and alime of Alibaba.
The construction of intelligent customer service system needs to rely on the industry data background, and based on massive knowledge processing and natural language understanding and other related technologies. The first generation of intelligent customer service system mainly faces the business content and answers the high-frequency business problems. This process depends on the accurate collation of the answers of high-frequency business problems by business experts. The main technical point is the accurate text matching ability between user problems and knowledge points. The new intelligent customer service system defines the service scope as a pan business scenario. In addition to solving the core high-frequency business problems, the requirements of intelligent shopping guide ability, obstacle prediction ability, intelligent chat ability, life assistant function and life entertainment interaction are also valued and covered. Among them, emotional ability, as an important embodiment of human like ability, has been applied in various dimensions of the intelligent customer service system, and plays a crucial role in improving the human like ability of the system.
Technology architecture of emotion analysis in intelligent customer service system
Figure 1 shows the classic intelligent customer service mode of man-machine combination. Users can accept the service from robot or artificial customer service through dialogue, and in the process of accepting robot service, they can jump to artificial customer service by command or robot automatic recognition. In the above-mentioned complete customer service mode, emotion analysis technology has been applied to the ability of multiple dimensions.
User sentiment detection
- Introduction of user emotion detection model
User emotion detection is the foundation and core of many emotion related applications. In this paper, we propose an emotion classification model integrating semantic features of words, multi phrase and sentence level, which is used to identify the emotions of “anxious”, “angry” and “thank you” contained in the user conversation of intelligent customer service system. The extraction technology of different levels of semantic features has been mentioned in related work. We can effectively improve the final effect of emotion recognition by combining different levels of semantic features. Figure 2 shows the architecture of the emotion classification model.
- Sentence level semantic feature extraction
Shen et al.  proposed swem model, which applies simple pooling strategy to word embedding vector to realize semantic feature extraction at sentence level, and the classification model and text matching model trained based on these features can achieve almost the same experimental results as the classical convolutional neural network model and recurrent neural network model.
In our model, we use the feature extraction ability of swem model to obtain the sentence level semantic features of user problems, and use them in the emotion classification model of user problems.
- Multi phrase semantic feature extraction
The traditional CNN model is used to extract the semantic features of n-ary phrases in many cases, where n is a variable representing the size of convolution window. In this paper, we empirically set n to 2, 3 and 4, and set 16 convolution kernels for each window size to extract abundant semantic information of n-ary phrases from the original word vector matrix.
- Word level semantic feature extraction
We use the team model  to extract word level semantic features. In the team model, words and category labels are represented in the same dimension semantic space, and the text classification task is implemented based on this representation. The semantic interaction between words and tags is increased by using the representation of category tags, so as to achieve a deeper consideration of word level semantic information. Figure 3 (2) shows the semantic interaction between category tags and words, and the comparison between the team model and the traditional model.
Finally, the semantic features of different levels will be input to the last layer of the whole model after being combined, and the final classification training will be carried out by the logistic regression model.
Table 1 shows the comparison results of online real-world evaluation results between our integrated model and three comparison models which only consider single level features.
User emotional comfort
- Introduction to the overall framework of user emotional comfort
The proposed framework includes offline and online parts, as shown in Figure 4.
- Offline part
First of all, we need to identify the user’s emotion. Here we select seven common emotions of users who need to be appeased for identification, they are fear, abuse, disappointment, grievance, anxiety, anger and thanks.
Secondly, we identify the subject content contained in the user’s questions. Here, the special business experts summarize 35 common subject expressions, including “complaining about service quality” and “feedback logistics is too slow”. For topic recognition model, we use the same classification model as emotion recognition.
Knowledge building is to sort out the frequent user problems that need to be appeased according to the more specific situation of some users’ content. The reason why these specific user problems are not merged into the above topic dimensions for unified processing is that the processing of the topic dimensions is relatively coarse-grained. We hope that for these high-frequency and more focused problems, we can also conduct more focused pacification and reply, so as to achieve better reply effect.
According to the dimensions of emotion, “emotion + theme” and high-frequency users’ questions, business experts have sorted out different granularity pacification response scripts. In particular, in the dimension of high-frequency users’ questions, we call each “question reply” collocation a piece of knowledge.
- Online section
Knowledge based appeasement is to appease users with specific emotional content expression. Here, we use a text matching model to evaluate the matching degree between user’s problems and the problems in our organized knowledge. If there is a problem in our sorted knowledge that is very similar to the meaning of the current user’s input question, the corresponding reply will be returned to the user directly.
Emotion response based on emotion and topic refers to giving users appropriate emotion response considering the emotion and topic information contained in the content expressed by users at the same time. Compared with knowledge-based pacification, this way of reply will be more generalized.
Emotion response based on emotion category is to pacify the user only considering the emotional factors in the content expressed by the user. This reply method is the supplement and cover of the above two reply methods, and the reply content will be more general.
Figure 5 shows three examples of online emotional comfort, corresponding to the above three different levels of response mechanism.
Table 2 shows the effect comparison of the classification models for the need to appease emotions, including the individual effect of each emotion category and the final overall effect. Table 3 shows the comparison of classification models for topics. Table 4 shows the effect of improving user satisfaction after adding emotional comfort for several negative emotions. Table 5 shows the improvement effect of user satisfaction after adding emotional comfort for gratitude.
Emotional generative chatting
- Emotional generative chat model
Figure 6 shows the model diagram of emotional generative chat in the intelligent customer service system. In the figure, the source RNN acts as an encoder, mapping the source sequence s to an intermediate semantic vector C, while the target RNN, as a decoder, can decode the target sequence y according to the semantic code C and our set emotional representation E and topic representation t. S and Y here correspond to “I’m in a good mood today” and “I’m so happy!” respectively Two sentences.
Generally, in order to enable the decoder to retain the information from the encoder, the last state of the encoder is passed to the decoder as the initial state. At the same time, encoders and decoders often use different RNN networks to capture different expression patterns of questions and replies. The specific calculation formula is as follows:
Although the dialogue generation model based on seq2seq has achieved good results, it is easy to generate secure but meaningless reply in practical application. The reason is that the decoder in this model only receives the last state output C of the encoder, which is not effective for long-term processing, because the state memory of the decoder will gradually weaken or even lose the information of the source sequence with the continuous generation of new words. An effective way to alleviate this problem is to introduce attention mechanism .
In the framework of seq2seq, which introduces attention mechanism, the probability of the final decoder’s output layer predicting words according to the input is as follows:
The objective function in the training process and the search strategy in the prediction process are consistent with the traditional RNN, which will not be repeated here.
- The results of affective generative chat model
After the training, the model is tested on real user questions, and the results are checked by business experts. The final qualified rate of the answers is about 72%. In addition, the average length of the reply text is 8.8 words, which is very suitable for the requirement of the reply length in the alixiaomi chat scene. Table 6 shows the effect comparison between AET (attention based electromagnetic & topical seq2seq model) and traditional seq2seq model. The comparison mainly focuses on the content qualification rate and reply length. After adding the emotional information, the reply content will be more abundant than the traditional seq2seq model, and the proportion of the content that conforms to the “5-20 word” best robot conversation reply length of user research analysis will also increase significantly, which will eventually improve the overall reply qualified rate significantly.
Figure 7 shows an application example of alixiaomi’s mood generating chat model in Xiaomi space. The two answers in the figure are given by the emotion generation model, and for the user input that the user insults the robot for being too stupid, our model can generate different answers according to the corresponding reasonable topic and emotion, which enriches the diversity of the answers. The two answers in the figure are generated by the two emotions of “aggrieved” and “sorry”.
Customer service quality inspection
- Definition of customer service quality problem
In this paper, the customer service quality inspection is to detect the service content that may have problems in the process of manual customer service dialogue with customers, so as to better find the problems existing in the service process of customer service personnel, and help them to improve, so as to improve the quality of customer service and ultimately improve customer satisfaction. As far as the author is aware, there is no publicly implemented algorithm model of artificial intelligence for customer service quality detection in customer service system.
Different from human-computer dialogue, the dialogue between human customer service and customer service is not in the form of one question and one answer, but the customer and customer service can input multiple sentences of text continuously. Our goal is to detect whether each sentence of customer service contains “negative” or “bad attitude” two kinds of service quality problems.
- Customer service quality inspection model
In order to test the service quality of a customer script, we need to consider its context content, including user issues and customer service scripts. The features we consider include text length, speaker role and text content. Among them, for the text content, in addition to using the swem model to extract the features of the current customer service scripts to be detected, we also detect the emotions of each round of scripts in the context, and find that the user emotion category and customer service emotion category are the model features. The emotion recognition model used here is the same as that described in Chapter 2, and will not be repeated. In addition, we also consider two structures (model 1 in Figure 8 and model 2 in Figure 9) to extract semantic features of text sequence based on context content.
In model 1, after coding the current customer service script and its context based on Gru or LSTM, forward and reverse Gru or LSTM are considered for the coding results The above and below coding results of the current customer service script to be detected are serialized again. The two serialized coding results are based on the current script, which can better reflect the semantic information of the current script. The model structure is shown in Figure 8.
In addition, model 2 takes the coding results of the current customer service script and its context as the final semantic features, and then encodes the overall forward Gru or LSTM in the order of before and after. Part of the model structure is shown in Figure 9. Compared with model 2, model 1 will highlight the semantic information of the current script to be detected, while model 2 will more embody the serialization semantic information of the overall context.
- Customer service quality test results
We compare the effects of the two context semantic information extraction models, and the comparison results are given in Table 7. The results show that the effect of model 1 is better than that of model 2. It can be seen that more weight should be given to the semantic information of the current script to be detected, and the semantic information of the context can play an auxiliary role in recognition. In addition, there is little difference between Gru and LSTM in the actual model training process, but Gru method is faster than LSTM method, so Gru method is used in all model experiments.
In addition, different from the index analysis at the model level, we also analyze the index of the model at the actual system level, including the two dimensions of quality inspection efficiency and recall rate. These two indicators are obtained by comparing the results of the model with those of the previous pure manual quality inspection. As shown in Table 8, both the quality inspection efficiency and the recall rate of quality inspection have been greatly improved. Among them, the reason why the recall rate of manual quality inspection is relatively low is that it is impossible to detect all customer service records manually.
Prediction of conversation satisfaction
- Conversation satisfaction
At present, one of the most important performance evaluation indicators of intelligent customer service system is user session satisfaction. However, as far as the author knows, there is no relevant research on the automatic prediction of user session satisfaction in intelligent customer service system.
Aiming at the prediction scenario of session satisfaction in intelligent customer service system, we propose a session satisfaction analysis model, which can better reflect the current user’s satisfaction with intelligent customer service. Because different users have different evaluation criteria, there will be a large number of session content, the source of conversation answers, and the emotional information of the conversation is completely the same. Therefore, we use two training models: one is the classification model of training model fitting emotional categories (satisfied, general, dissatisfied), the other is the regression model of training model fitting the distribution of conversational emotions. Finally, we compare the effects of the two models.
- Feature selection of conversation satisfaction
The model considers various dimensions of information: semantic information (user script), emotional information (obtained by emotion detection model) and answer source information (reply to the answer source of the current script).
Semantic information is the content information expressed in the process of communication between users and intelligent customer service, which can better reflect the user’s current satisfaction from the user’s speech. The semantic information we use in the model refers to the multi round script information in the conversation. In order to ensure that each model can process the same round script, we only use the last four user scripts in the conversation in our experiment. The reason why we choose this way is that through the analysis of conversation data, the semantic information of the user at the end of the conversation is consistent with the whole conversation satisfaction process Degree is more relevant. For example, users who express gratitude at the end of a conversation are basically satisfied, while users who express criticism are likely to express dissatisfaction with the service.
Emotional information generally plays a very important role in user satisfaction. When users are angry, abusive and other extreme emotions, the probability of user feedback dissatisfaction will be great. The emotion information here corresponds to the words in semantic information one by one, and the selected words are used for emotion recognition to obtain the corresponding emotion category information.
Answer source information can well reflect what kind of problems users encounter. Because different answer sources represent different business scenarios, the difference of user satisfaction caused by different scenarios is obvious. For example, complaints and rights protection are more likely to lead to user dissatisfaction than consultation.
- Conversation satisfaction model
In this paper, we propose a conversation satisfaction prediction model which combines semantic information features, emotional information features and answer source information features. The model takes full account of the semantic information in the conversation, and uses the way of data compression to fully express the emotional information and the answer source information. The model structure is shown in Figure 10.
Semantic feature extraction. The first layer obtains the sentence representation of each sentence (the first layer Gru / LSTM part in Figure 10), and the second layer obtains the high-order representation of multi round user’s speech according to the sentence representation results of the first layer.
(the second layer Gru / LSTM section in Figure 10) makes full use of the sequence information of the user’s script. In addition, the swem sentence features of the last sentence will be obtained to enhance the influence of the semantic features of the last sentence.
Emotion feature extraction: because the obtained emotion feature is one hot type, and one hot has obvious shortcomings, the data is sparse and can not express the direct relationship between emotions. Here we learn an emotional embedding to better express emotional characteristics.
Answer source feature extraction: the initial answer source feature is also one hot feature, but because there are more than 50 answer sources, the data is very sparse, so feature compression is needed. Here, embedding learning method is also used to represent the answer source feature.
Model prediction layer: try the satisfaction category prediction and satisfaction distribution prediction, the former belongs to the classification model, the latter belongs to the regression model.
- Experimental results of session satisfaction prediction
The experimental results are shown in Figure 11. From the experimental results, the classification model satisfaction prediction effect is poor, the average is more than 4 percentage points higher than the actual user feedback, the regression model can well fit the user feedback results, and reduce the vibration of small sample results, in line with expectations. As shown in Table 9, the difference between the mean value of the regression model and the result of the user’s real feedback is only 0.007, while the variance is one third less than before, which proves the effectiveness of the regression model.
This paper summarizes some practical application scenarios of emotion analysis ability in the current intelligent customer service system, as well as the corresponding model introduction and effect display. Although the ability of emotional analysis has penetrated into all aspects of the human-computer dialogue process of the intelligent customer service system, it can only be regarded as the beginning of a good attempt, and it still needs to play a greater role in the process of building the human like ability of the intelligent customer service system.