How to use Python to implement a paper weight reduction tool

Time:2022-1-15

preface

It was the graduation season, and many young friends were troubled by the duplication of papers. Therefore, I thought of making a simple automatic de duplication tool. First look at the effect, and then we will further analyze the principle or code implementation.

First, you need to enter appid and key. These can apply for an account on Baidu translation open platform, and you can apply for an account for free. Then copy the text content that needs to be reduced to the corresponding input box, and click the start button to output different but similar sentences, that is, to reduce the weight and remove the weight. Click the copy button to copy the new text to the clipboard. Click the clear button to re-enter the text that needs to be reduced, and repeat it cycle.

De duplication principle

The granularity of duplicate checking is sentences. The similarity of two sentences mainly depends on which words the sentence contains and the position of words in the sentence. Sentence similarity is only a textual comparison, regardless of semantic similarity.

For this reason, the measure we can take is to change the sentence structure and replace it with similar words.

In order to complete the automatic replacement of these sentences and achieve the purpose of weight reduction, it is easy to think of using the mutual conversion between different languages to generate new texts. For example, in this tool, I use the strategy of Chinese → English → Korean → Chinese. You can also take a longer conversion path, but that seems to reduce the readability of the text to a great extent.

Use of open platform

I use the interface of Baidu translation open platform to translate sentences. After simply applying, I can obtain 2 million free character translation rights every month.

The access method of this API is a little troublesome. You need to generate a signed sign and splice a complete URL.


def translate(q,lan_from,lan_to):
    url = 'http://api.fanyi.baidu.com/api/trans/vip/translate'
    salt = random.randint(1, 65536)
    sign = hashlib.md5((str(appid)+str(q)+str(salt)+str(key)).encode('utf-8')).hexdigest()
    params = {
        'from' :lan_from,
        'to' :lan_to,
        'salt' : salt,
        'sign' : sign,
        'appid' : appid,
        'q': q
    }
    r = requests.get(url,params=params)
    txt = r.json()
    if txt.get('trans_result', -1) == -1:
        print('ERROR Code:{}'.format(txt))
        return q
    return txt['trans_result'][0]['dst']

summary

After understanding the principle and API call methods, we can easily write a GUI interface, that is, this weight reduction tool. Of course, this tool is very rudimentary, and you can expand it more comprehensively.

This is the end of this article on how to use Python to implement a paper weight reduction tool. For more information about Python paper weight reduction tools, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!