Using word technology to help buy goods, people with insight “know” things do so

Time:2021-10-20

Summary:Nowadays, the products on the market are more and more diversified, but as consumers, how should we choose and query which goods are good or bad? With the advent of the intelligent era, recognition technology has been integrated into our daily life. How should we use character recognition technology to apply to our life and help us query and select goods? This blog will collect and organize the data from the national standard number of products into a library, and complete the identification and corresponding query of the national standard number of a brand of yogurt products by establishing OCR on modelarts.

1. Reptiles collect data

The main purpose of this sharing activity case is to query the product standard number and the specific information contained in the standard number, so we need to collect and sort out the product standard number information data in advance. However, due to the huge amount of standard number data, this sharing activity only collected and sorted out the corresponding data for the national standard number, and a total of 9620 pieces of national standard number information were collected. In the future, we will also add as much standard information as possible, including local standard numbers. Of course, if there is any lack of follow-up information, you are welcome to add.

Next, this paper will explain this case from the specific practical operation part.

The first is to find the information of standard number on many standard number public websites and collect it into the format of database. Here we choose a website, use the crawler program to collect the national standard number first, sort it out and establish a database.

Using word technology to help buy goods, people with insight

A public standard number data website

Enter the website and press F12 on the website to jump out of the source code of the website. Find and enter elements, you can find the information websites corresponding to many standard numbers. First, use the crawler program to climb down the numbers and corresponding websites of these standard numbers to further collect the specific information of the standard number.

Using word technology to help buy goods, people with insight

After entering the specific standard number information website, we find that there are many specific information contained here. We select the classification level, standard number, standard name, status, release and implementation date, issuing department and specific content of the national standard number as the data subtree of a single national standard number. Finally, the database of Excel file is generated to complete the crawling and collection of standard number programs.

Using word technology to help buy goods, people with insight

Some screenshots of the database are as follows:

Using word technology to help buy goods, people with insight

For the crawler program and the generated database, we will put it in the attachment. Please download and use it by programmers who need it.

2. OCR character recognition of yogurt products packaged in modelarts

The model and code of OCR will not be described in this article. Please turn to this website blog for learning: https://bbs.huaweicloud.com/blogs/195963

Combined with the whole process of the above crawler program, we have obtained the database of national standard number. Next, we will use OCR to identify the packaging text of a yogurt from the actual operation, and then get our standard number, so as to get the specific information of the standard number and explain the whole process.

First, we create a new notebook on modelarts and upload the OCR model code to our notebook:

Using word technology to help buy goods, people with insight

After OCR identification, we identify and extract the national standard number information on the yogurt packaging information:

Using word technology to help buy goods, people with insight

And get the text information we recognized for yogurt packaging information at the command line terminal:

Using word technology to help buy goods, people with insight

After OCR recognition on modelarts, we get the standard number information of yogurt: gb-19302, which corresponds to the database established by using crawler program. The specific information of the standard number is to customize the product label of yogurt:

Using word technology to help buy goods, people with insight Using word technology to help buy goods, people with insight

Attachment Download:   Ocr.zip   4.72MB

Click focus to learn about Huawei cloud’s new technologies for the first time~