Secrets of AI technology behind Jingdong Mall (1) — automatic generation of summary based on key words

Time:2020-4-8

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words

introduction
In the past few decades, the computing power of human beings has been greatly improved; with the continuous accumulation of data and the increasingly advanced algorithm, we have entered the era of artificial intelligence. Indeed, the concept of artificial intelligence is hard to understand, and the technology is even more remarkable. The data and algorithms behind it are huge and complex. Many people are wondering what practical applications will AI have now or in the future?
In fact, the practical application of AI and the commercial value it brings are not so “mysterious” and have been around us for many times. Next, through the interpretation of relevant AI papers, the column of [AI paper interpretation] will reveal how AI technology empowers the field of e-commerce, as well as relevant landing and practice. Artificial intelligence technology has rich application scenarios in the field of e-commerce. Application scenario is the data entry. Data is extracted by technology, which in turn acts on technology. The two complement each other.

Based on natural language understanding and knowledge map technology, JD has developed AI writing service for commodity marketing content. And this technology is applied to the channel of “finding good goods” in Jingdong Mall.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words
Jingdong [find good] channel

Hundreds of thousands of commodity marketing graphic materials created by ai not only fill the huge gap between commodity update and talent writing content update, but also improve the content richness of content channel.

At the same time, AI generated content is actually superior to artificial creative marketing in terms of exposure click through rate, conversion rate of business details, etc.

Next let’s read the papers selected in AAAI 2020 to see how to use AI to achieve different marketing strategies and styles for different groups to improve the marketing conversion rate.


Automatic text summarization is a traditional task in the field of natural language processing. The goal of automatic summarization task is to obtain a simplified text which contains the most important information for a given text. The common automatic summarization methods include extractive summarization and abstract summarization. Abstract automatic abstracts are composed of key words, phrases or sentences existing in a given text, while generative automatic abstracts are generated by using natural language generation technology and abstract semantic representation of a given text.

This paper introduces the method of generating sentence summary based on keyword guidance, which combines the extraction type automatic summarization and the generation type automatic summarization. Compared with the comparison model, this method has better performance in Gigaword sentence summary data set.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words

Paper link: http://box.jd.com/sharedinfo/b2234bb08e365eec

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words

The input of the productive sentence summary task is a long sentence, and the output is a simplified short sentence of the input sentence.

We note that some important words (i.e. key words) in the input sentences provide guidance for the generation of abstracts. On the other hand, when people create abstracts for input sentences, they often find out the key words in the input sentences first, and then organize the language to connect these key words. Finally, the generated content will not only cover these keywords, but also ensure their fluency and grammatical correctness. We think that compared with the pure abstract and generative automatic abstracts, the generative automatic abstracts based on keyword guidance are more close to people’s habit of creating abstracts.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words
Figure 1: the overlapping keywords (marked in red) between the input sentence and the reference summary cover the important information of the input sentence. We can generate the summary based on the keywords extracted from the input statement

Let’s give an example of a simple sentence summary. As shown in Figure 1, we can roughly use the overlapping words (except for the stop words) of the input sentence and the reference summary as the key words, which cover the main points of the input sentence. For example, through the keywords “world leaders”, “close” and “Chernobyl”, we can get the main information of the input sentence, that is, “world leaders call for the closure of Chernobyl”, which is consistent with the actual reference summary “world leaders urge support for the closure plan of Chernobyl nuclear power plant”. This phenomenon is very common in the task of sentence summary: on the Gigaword sentence summary data set, more than half of the words in the reference summary will appear in the input sentence.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words

The input of the sentence summary task is a long sentence, and the output is a short text summary. Our motivation is that the key words in the text can provide important guidance information for the automatic summarization system. First of all, we use the overlapped words (except the stop words) between the input text and the reference summary as the ground truth keywords. Through multi task learning, we share the same encoder to encode the input text, and train the keyword extraction model and the summary generation model. The keyword extraction model is a sequential annotation model based on the hidden layer state of the encoder, and the summary generation model Type B is an end-to-end model based on keyword guidance. After the training convergence of keyword extraction model and abstract generation model, we use the trained keyword extraction model to extract keywords from the text in the training set, and use the extracted keywords to fine tune the abstract generation model. During the test, we first use the keyword extraction model to extract keywords from the text in the test set, and finally use the extracted keywords and the original test text to generate a summary.

1. Multi task learning

The task of text summarization is very similar to the task of keyword extraction in a sense, which is to extract the key information in the input text. The difference lies in its output form: the text summary task outputs a complete piece of text, while the keyword extraction task outputs a set of keywords. We think that these two tasks need the ability of encoder to recognize the important information in the input text. Therefore, we use the multi task learning framework to share the two task encoders and improve the performance of the encoder.

2. Abstract generation model based on keyword guidance

Inspired by the work of Zhou et al. [1], we propose a selective coding based on keyword guidance. Specifically, because the key words contain more important information, we construct a selective gate network through the guidance of the key words, which encodes the hidden semantic information of the input text twice and constructs a new hidden layer. Based on this new hidden layer, subsequent decoding is carried out.

Our decoder is based on the point generator network [2], which is an end-to-end model integrating replication mechanism. For the generator module, we propose direct connection, gate fusion and hierarchical fusion to fuse the context information of the original input text and keywords; for the pointer module, our model can selectively copy the text in the original input and keywords to the output summary.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words

1. Dataset

In this experiment, we choose Gigaword data set, which contains about 3.8 million training sentence summary pairs. We used 8000 pairs as validation sets and 2000 pairs as test sets.

2. Experimental results

Table 1 shows that our proposed model performs better than the model without keyword guidance. We tested different selective encoding mechanisms, including self selection of input text, keyword selection and mutual selection. The experimental results show that the effect of mutual selection is the best. For the generator module, we found that the hierarchical fusion method is better than the other two fusion methods. Our two-way pointer module performs better than the original model which can only be copied from the input text.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words
Table 1

This paper is devoted to the task of generating sentence summary, that is, how to transform a long sentence into a short summary. Our model can use keywords as a guide to generate a better summary, and get better results than the comparison model.

1) By using the framework of multi task learning to extract keywords and generate abstracts;

2) Through the selective encoding strategy based on keywords, important information is obtained in the encoding process;

3) Through the dual attention mechanism, the information of the original input sentence and key words is dynamically integrated;

4) Through the double copy mechanism, the words in the original input sentences and keywords are copied into the output summary.

On the standard sentence summary data set, we verify the effectiveness of keywords on sentence summary task.

Notes:

[1] Zhou, Q.; Yang, N.; Wei, F.; and Zhou, M. 2017. Selective encoding for abstractive sentence summarization. In Proceedings of ACL, 1095–1104.

[2] See, A.; Liu, P. J.; and Manning, C. D. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of ACL, 1073–1083.


In the last column, we introduced in detail how Jingdong Mall carries out further technology exploration and innovation on the basis of existing technology, so as to effectively improve the marketing conversion rate of e-commerce. Details can be viewed by clicking below To be careful

Uncover the AI technology behind Jingdong Mall – automatically generate summary based on keywords

Jingdong AI research institute
JD AI Research Institute focuses on continuous algorithm innovation, most of which will be driven by the actual business scenario requirements of JD. The focus areas of the research institute are: computer vision, natural language understanding, dialogue, voice, semantics, machine learning and other laboratories, which have gradually set up workplaces in Beijing, Nanjing, Chengdu, Silicon Valley and other parts of the world.

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words

Secrets of AI technology behind Jingdong Mall (1) -- automatic generation of summary based on key words