Go Language Realizes Automatically Filling in Ancient Poetry Example Code

Time:2019-10-9

Preface

“The day is at its end, ______________”. The following sentence naturally fills the Yellow River into the current, then “the sun and the moon suddenly do not drown out,__________, afraid of beauty’s evening”, how to fill in the middle two sentences?

There is a demand in recent work that 1500 Chinese poems have no answers to fill in the blanks. Now we need to match these questions with their corresponding answers. Fortunately, the title information is complete, pointing out the source of the poems and the author’s information. It’s natural to think of crawling the corresponding article information on the Internet and then matching the answers with strings. At present, the effect is still good. Basically, all the answers to the questions are available. Now record the operation process and make a summary.

1. Access to Article Information

After a long search on the internet, it is found that Baidu Chinese has a good collection of ancient poems and standard format. The whole crawling process is relatively simple. Browsers analyze websites and find their search interface: http://hanyu.baidu.com/hanyu/ajax/sugs only need to pass one parameter: mainkey, which is a string in URLEncode format. The interface returns a list of matches and then filters the list as the author’s name. The detailed code is as follows:

baseUrl := "http://hanyu.baidu.com/hanyu/ajax/sugs?"
 client := &http.Client{
 }
 u, _ := url.Parse(baseUrl)
 q := u.Query()
 q.Set("mainkey", name)
 u.RawQuery = q.Encode()

 // Add Header
 req, _ := http.NewRequest("GET", u.String(), nil)
 req.Header.Add("User-Agent", `Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36`)
 req.Header.Add("DNT", "1")
 req.Header.Add("Host", "hanyu.baidu.com")
 req.Header.Add("Accept-Language", "zh-CN,zh;q=0.8")
 req.Header.Add("Referer", "http://hanyu.baidu.com/shici/detail?pid=be520db056da43238035dc18bb1e1798&tn=sug_click")
 
 resp, errDo := client.Do(req)

After the return value is obtained, the corresponding author information is screened out.

// If there are more than one search result, compare whether the author is correct or not
 respJson.ForEach(func(key, value gjson.Result) bool {
 // Let's see if there is display_name.
 displayName := value.Get("display_name.0").String()
 sid := value.Get("sid.0").String()
 if len(displayName) == 0 {
 // Not this record.
 return true
 }

 // Look at the type.
 typeStr := value.Get("type.0").String()
 if typeStr == "poemline"{
 // Take Source
 displayName = value.Get("source_poem.0").String()
 sid = value.Get("source_poem_sid.0").String()
 }

 literatureAuthor := value.Get("literature_author.0").String()
 // Is the author consistent?
 if literatureAuthor == author {
 searchResult.Sid = sid
 searchResult.DisplayName = displayName
 searchResult.Author = literatureAuthor
 return false
 }
 return true // keep iterating
 })

SearchResult saves the search results, gets the article page according to sid, and parses the article.

func GetContent(sid string) (content string, err error) {
 baseUrl := "http://hanyu.baidu.com/shici/detail"

 result := make([]string, 0, 0)
 client := &http.Client{
 }

 u, _ := url.Parse(baseUrl)
 q := u.Query()
 q.Set("pid", sid)
 u.RawQuery = q.Encode()

 req, _ := http.NewRequest("GET", u.String(), nil)
 req.Header.Add("User-Agent", `Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36`)
 req.Header.Add("DNT", "1")
 req.Header.Add("Host", "hanyu.baidu.com")
 req.Header.Add("Accept-Language", "zh-CN,zh;q=0.8")
 req.Header.Add("Referer", "http://hanyu.baidu.com/shici/detail?pid=be520db056da43238035dc18bb1e1798&tn=sug_click")

 resp, errDo := client.Do(req)

 if errDo != nil || resp.StatusCode != 200 {
 Er = errors. New ("unable to connect Baidu Chinese" + errDo. Error ())
 return
 }

 docm, errDoc := goquery.NewDocumentFromResponse(resp)

 if errDoc != nil {
 Er = errors. New ("parsing Doc errors" + errDoc. Error ())
 return
 }

 // Poetry information is stored in body_p div and can be obtained through PuerkitoBio/goquery library.
 pSelect := docm.Find("#body_p")
 pSelect.Each(func(pos int, selection *goquery.Selection) {
 content := strings.TrimSpace(selection.Text())
 result = append(result, content)
 })

 content = strings.Join(result, "")
 return
}

At present, we can crawl the data of Baidu Chinese and ancient poetry websites. If there is a better data source, we only need to implement Spider interface and register in MapSpider Manifest () method.

type Spider interface {
 GetContent(SearchResult) (string, error)
 FindContent(string, string) (SearchResult, error)
}

func MapSpiderManifest() map[string]Spider {
 // Initialize and register all Spiders
 spiderMap := make(map[string]Spider)

 // Baidu
 baiduSpider := new(BaiduSpider)
 spiderMap["baiduSpider"] = baiduSpider

 // Ancient Poetry Network
 gushiwenSpider := new(GushiwenSpider)
 spiderMap["gushiwenSpider"] = gushiwenSpider
 return spiderMap
}

2. Searching for Poetry and Sentences

Ancient poems and prose are written by hand. In the past, when I was in school, I did a lot of work. I cut out a sentence and randomly selected a few sentences for students to write by hand. Generally, it can be classified into the following modes:

Let’s leave a blank at the beginning:, [,…], who can’t get out of their homeland?
Leave the end blank: It’s all the past,, [,…].
Leave a space in the middle: the moon rises above Dongshan,, Bailu Hengjiang,

No matter what the pattern is, we can only know what the answer to this blank is if there are prompts in front or behind each blank. That is to say, such a blank can find the answer independently, let alone call it autonomous blank. And there is no empty prompt sentence before and after, only waiting for the nearby autonomous space to find the answer, in order to find its own answer, with a diagram to illustrate more clearly:

The gray block in the figure has prompt sentences, so we can find the corresponding answer by crawling down the content of the article step by step, and fill in the Blank. The specific search algorithm is shown in the following code:

// For BlankString and PostString, which are known as newFind
func makeWithPreContent(contentsSplit []string, newFind *Find) {
 for l := range contentsSplit {
 if isEqual(contentsSplit[l], newFind.PreString) && l < len(contentsSplit)-1 {
 newFind.BlankString = contentsSplit[l+1]
 if l < len(contentsSplit)-2 {
 newFind.PostString = contentsSplit[l+2]
 }
 newFind.BlankFinish = true
 }
 }
}

// New Find's PostString, BlankString and PreString
func makeWithPostContent(contentsSplit []string, newFind *Find) {
 for l := range contentsSplit {
 if isEqual(contentsSplit[l], newFind.PostString) && l > 0 {
 newFind.BlankString = contentsSplit[l-1]
 if l-1 > 0 {
 newFind.PreString = contentsSplit[l-2]
 }
 newFind.BlankFinish = true
 }
 }
}

// Separating content by punctuation
func SplitByPunctuation(s string) ([]string, []string) {
 regPunctuation, _ := regexp.Compile(`[,,。.??!!;;::]`)
 // Match the punctuation symbols and save them. Then split the string
 toPun := regPunctuation.FindAllString(s, -1)
 result := regPunctuation.Split(s, -1)

 if len(result[len(result)-1]) == 0 {
 result = result[:len(result)-1]
 }

 // Remove the front and back spaces and quotation marks
 for i := range result {
 result[i] = strings.TrimSpace(result[i])
 regQuoting := regexp.MustCompile("[“”‘'']")
 result[i] = regQuoting.ReplaceAllString(result[i], "")
 }
 return result, toPun
}

After all the autonomous blocks have found the answers, we can regard each autonomous block as the head of a two-way list. All we need to do is to traverse each two-way list and find the answers of each node through the search algorithm. When the next node is NULL or the next node is an autonomous block, it stops traversing and processes the next two-way linked list. In this way, no matter how complex the content given to fill in the blanks is, the work can be completed automatically and smoothly.

3. Effect

Some commonly used classical Chinese or poetry:

1. Qian Chibi Fu

2. Lisao

Project address: Ancient Poetry Fill Blank (local download)

summary

Above is the whole content of this article. I hope that the content of this article has a certain reference value for everyone’s study or work. If you have any questions, you can leave a message and exchange it. Thank you for your support to developpaer.

Recommended Today

Laravel returns the unified format error code

background Recently, I was learning to develop an Android project. The back-end interface project started a new project with PHP’s yii2.0 framework, and then changed to laravel 5.5. Recently, I saw that laravel upgraded the new version, so I updated the project to laravel 6.4In the process of using Yii and laravel, both frameworks are […]