How to use Baidu brain character recognition technology to quickly integrate practical gadgets

Time:2020-2-26

1、 General overview

This paper mainly introduces the main functions, performance evaluation and core code interpretation of cloud cat OCR software developed by myself and based on Baidu AI. Because it is a combination of several posts, so the length is long. I hope you can read it patiently and take what you need.

This paper is divided into the following parts:
The first part is the introduction of cloud cat OCR software. The main functions of the software are introduced by the developer himself. Compared with abryy and other OCR software, cloudcat OCR has more comprehensive functions and is easier to use. The key point is that cloudcat OCR is completely free for everyone at present. Of course, because it is a trial version, there may be some bugs. Please don’t click randomly when using it. See the post for download address of cloud cat display version: https://ai.baidu.com/forum/topic/show/955975
The second part is the implementation of cloud cat OCR based on Baidu OCR. At the same time, it will also show some of the core code of the software, so that we can reference and make more creative products.
The third part is the use description and effect evaluation of cloud cat OCR. But because the cloud cat OCR was developed around the end of 2017, it is not the latest interface function of Baidu OCR. If cloud cat can get your support, I can consider developing a new version to access more Baidu’s latest AI interfaces, and I hope you can enjoy it.
The last part of this paper is the appendix, with the code interpretation I developed based on Baidu OCR, using the latest Baidu handwriting recognition interface, which is also packaged for your reference.

The first part introduces the software of cloud cat OCR

1、 Introduction to cloud cat OCR

Cloud cat OCR is based on Baidu cloud OCR algorithm, developed by attacking fox. This software is developed by C language and runs on Windows platform. The main interfaces are general character recognition, general character recognition (high precision) and table recognition.

2、 The main functions of cloud cat OCR are as follows:

1. Batch image and text recognition, preview image, automatically wrap and indent recognition results, and control QPS concurrency (QPS function is temporarily suspended due to Baidu cloud’s timeout problem);

2. Batch table image recognition supports automatic opening of recognition results, and users can also choose to directly open the saved directory;

3. PDF to picture: in the hardware environment of my notebook (configured as i7 processor / 8g memory / 128G SSD hard disk), the memory occupied by the PDF to picture program module does not exceed 400m, and the PDF file with more than 500 pages of content can be converted in about 2 minutes. One click opening of conversion results folder is supported.

4. Cloudcat software supports skin changing function, and currently has two sets of skin;

5. API key and secret key can be set;

6. Support the identification of Midway stop;

7. Support the recognition of the same picture after the setting is changed;

8. Support multiple languages;

9. Other functions, such as identification statistics, font size control, right-click to save the identification results as RTF files, select all and copy the identification results, etc;

3、 Demo post link

http://ai.baidu.com/forum/topic/show/492371

4、 Cloud cat OCR demo video link

https://v.qq.com/x/page/r0564n4a87e.html

I suggest you use 1.2x or 1.5x to watch it, because my sound speed is a little slow.

The second part describes the implementation of cloudcat OCR based on Baidu OCR

I. overview

Cloud cat OCR is based on Baidu AI, a software running on Windows platform. I use C ා language to develop in the integrated development environment of visual studio 2017. The development method is SDK package development. In the development, we need to refer to Baidu’s technical documents.

Baidu cloud text recognition technical document address:

https://cloud.baidu.com/doc/OCR/index.html

2、 Preparations

First, we need to download the latest Baidu character recognition SDK package.

The download address of C ා SDK package is as follows:

http://ai.baidu.com/sdk#ocr

After downloading, extract the latest package in the folder net45.

Open visual studio2017 development environment and choose new project. Because I intend to explain with console project, I will choose new project – C ා console project. After the project is built, you need to reference the SDK package downloaded above in the project.

3、 Core code explanation

(1) Call Baidu OCR function to recognize picture and text, and the returned format is JSON

The code is as follows:

using System;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System.IO;

using System.Drawing;

using System.Collections.Generic;

using System.Linq;

 

namespace myOCRDemo

{

    class Program

    {

        public static void GeneralBasicDemo()

        {

//Set appid / AK / sk

Var api_key = “your API key”;

Var secret_key = “your secret key”;

//Create objects

            var client = new Baidu.Aip.Ocr.Ocr(API_KEY, SECRET_KEY);

Client. Timeout = 60000; / / modify timeout

Var image = file.readallbytes (“image file path”);

//Call general text recognition. The picture parameter is local picture, which may throw exceptions such as network. Please use try / catch to catch

            var result = client.GeneralBasic(image);

            Console.WriteLine(result);

        }

        static void Main(string[] args)

        {

            GeneralBasicDemo();

            Console.Read();

        }

    }

}

Note: during the specific development, you need to change the API key and secret key above to your own. For how to apply for and view these two keys, please refer to the evaluation post I wrote. The post links are as follows:

http://ai.baidu.com/forum/topic/show/955989

In addition, don’t forget to change the image file path to your own. The following is an example of the results of the identification:

The original picture is as follows:

(2) Analyze the JSON format and transform the recognition result into a more intuitive text type

The code is as follows:

using System;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System.IO;

using System.Drawing;

using System.Collections.Generic;

using System.Linq;

 

namespace myOCRDemo

{

    class Program

    {

        public static void GeneralBasicDemo()

        {

//Set appid / AK / sk

Var api_key = “your akey”;

Var secret_key = “your skey”;

//Create objects

            var client = new Baidu.Aip.Ocr.Ocr(API_KEY, SECRET_KEY);

Client. Timeout = 60000; / / modify timeout

Var image = file.readallbytes (@ “your picture path”);

//Call general text recognition. The picture parameter is local picture, which may throw exceptions such as network. Please use try / catch to catch

            var result = client.GeneralBasic(image);

//Analyzing the code of JSON

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            int num = (int)jo[“words_result_num”];

            string[] words = new string[num];

            for (int i = 0; i 

                words[i] = jo[“words_result”][i][“words”].ToString();

/ / return value

            string txtOCR = null;

            for (int i = 0; i 

                txtOCR += words[i] + “\n”;

//Show results

            Console.WriteLine(txtOCR);

        }

        static void Main(string[] args)

        {

            GeneralBasicDemo();

            Console.Read();

        }

    }

}

The running results of the program are as follows:

This is more in line with human reading habits. The above code is also the core basic code, which can be used for some optimization, such as automatic line wrapping, automatic indentation, automatic punctuation change according to language habits, etc.

(3) Table identification

Baidu’s programming of table character recognition is more cumbersome, mainly divided into two steps: the first step is to submit a request for table character recognition and obtain the requestid; the second step is to obtain the result of table character recognition according to the requestid, which is in Excel file format by default, and the JSON result will return a section of next address.

In addition to the above two steps, my program also adds the code of automatically downloading Excel files to the local computer for your reference. In addition, it should be noted that between the two steps of submitting the identification request and obtaining the identification result, the program must set a delay, otherwise the downloaded URL cannot be obtained. After the actual test, it is more appropriate to delay more than 3 seconds, and errors may occur below 3 seconds.

The code is as follows:

/// 

 

///Table character recognition

        /// 

 

        public static void myTableRecognitionRequestDemo()

        {

//Set appid / AK / sk

Var api_key = “your API key”;

Var secret_key = “your secret key”;

//Create objects

            var client = new Baidu.Aip.Ocr.Ocr(API_KEY, SECRET_KEY);

Client. Timeout = 60000; / / modify timeout

Var image = file. Readallbytes (@ “F: \ table picture 1. JPG”); / / change the path to your table picture here

//Call table text recognition, which may throw exceptions such as network. Please use try / catch to catch

            var result = client.TableRecognitionRequest(image);

/ / parse Json

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            string requestId = jo[“result”][0][“request_id”].ToString();

Console. Writeline (“get requestid:” + requestid);

//A delay of 3 seconds is necessary

            System.Threading.Thread.Sleep(3000);

//Get table recognition results

//Sometimes you don’t get a link, you need to try several times

            var resultExcel = client.TableRecognitionGetResult(requestId);

Console. Writeline (“the table recognition result obtained is as follows:”);

            Console.WriteLine(resultExcel);

//Parse JSON to get links

            JObject joResult = (JObject)JsonConvert.DeserializeObject(resultExcel.ToString());

            string excelURL = joResult[“result”][“result_data”].ToString();

Console. Writeline (“the obtained excel file download address is: \ n” + excel URL);

//Automatically download Excel files to the computer

            WebClient df = new WebClient();

DF. Downloadfile (Excel URL, @ “F: \ identification result. XLS”); / / here you need to change the path of your download file

Console.writeline (“download complete”);

        }

Test pictures used by the author:

Screenshot of table text recognition result:

Tail note: the sample codes in this article are all the latest codes, which are consistent with the codes in the baidu SDK documents. However, the cloud cat OCR has been written by the end of 2017, and the codes are a little old, so the source codes are not directly pasted.

Original post address of code article:

http://ai.baidu.com/forum/topic/show/956037

The third part is the use description and effect evaluation of cloudcat OCR

I. overview

The author contacted Baidu cloud service platform in 2017, which I also call Baidu AI here. According to the function interface provided by Baidu AI, the author has programmed and implemented an OCR software — cloud cat OCR. Most of the code development of cloudcat OCR was completed before the end of 2017. The reason why it has been hidden up to now is that some of my personal affairs (child birth, etc.) – I used my spare time for software development, so I interrupted it for about a year, and now I have time to continue this project.

Address of the original post:

http://ai.baidu.com/forum/topic/show/955989

2、 Specific contents of evaluation

(1) Preparations

Before using cloud cat OCR, we must go to Baidu cloud official website to register an account. After having an account, we need to apply for API key and secret key under specific cloud service projects. Generally, these two keys are kept by each user and cannot be disclosed to outsiders. Because Baidu cloud has now officially charged, and the number of free calls per user per day is limited, and the increase of the limit needs to pay a fee. The main basis for users to use Baidu cloud AI interface is these two keys, so we need to keep them. Here is a simple picture of preparations:

(2) Official use of cloudcat OCR

After users have Baidu cloud API key and secret key, they can officially use cloud cat OCR. The specific steps are as follows:

(3) Specific contents of evaluation

First, introduce the main interface of Baidu AI called by cloudcat OCR, first is general character recognition (with location version), second is general character recognition (with location high-precision version), and last is table character recognition. Next, introduce these three kinds of recognition in turn.

1. Mixed use of general character recognition (with position version) and general character recognition (with position high precision version)

As shown in the figure above, the user can select multiple languages (including German, French, Spanish, etc.), and then click word recognition. Because the high-precision character recognition interface provided by Baidu cloud only supports Chinese and English, while the general character recognition supports multiple languages other than Chinese and English, so the author writes the software in which the two interfaces are mixed. See the code section for details on how to mix them. Generally, high-precision text recognition is better than general-purpose, but it is also time-consuming.

The software supports to save the recognized text results as files on this machine, as shown in the following figure:

The saved file is in RTF format and can be opened with WPS or office word. Next, the statistical results of identifying 20 pictures at one time are shown as follows:

From the above figure, we can see that the speed of Baidu cloud’s character recognition results is good, and the recognition speed is about 2-3 seconds on average.

2. Table character recognition

The main steps of table text recognition are as follows:

The identified result software will be automatically saved as an excel file and opened, as shown in the figure below:

As can be seen from the above figure, the speed of table text recognition is slower than that of ordinary text recognition, which takes about 5-6 seconds.

Evaluation summary: Baidu OCR is good for print recognition. Compared with the previous OCR software, baidu OCR is a revolutionary progress. Of course, it has its own short board. For example, handwriting recognition, the author has not yet evaluated, but Baidu cloud universal text high-precision interface hand writing recognition is poor. Another example is QPS concurrency. My understanding is that it can improve the speed of OCR text recognition, especially for a large number of image and text recognition, which can save a lot of time. Unfortunately, baidu cloud doesn’t seem to do a good job in concurrency. The program doesn’t necessarily support QPS concurrency. We also hope that Baidu can correct this problem later.

 

Appendix:

C ා programming to realize handwritten recognition

I. overview

I am using C ා programming, calling Baidu API interface to achieve handwriting recognition, referring to Baidu’s product documentation.

Document address: https://cloud.baidu.com/doc/ocr/index.html

2、 Code and explanation

Most of my source code is from Baidu’s product documentation, but there are also some problems. For example, for the coding problem of character recognition, Baidu’s code gives the default code, but in my machine, it will display garbled code. After searching the data, I changed the code to utf8, and the problem of disorderly code was solved.

All source codes of the author are as follows:

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

using System.Net.Http;

using Newtonsoft.Json;

using Newtonsoft.Json.Linq;

using System.IO;

using System.Drawing;

using System.Web;

using System.Net;

 

namespace myHandwrite

{

    public static class FileUtils

    {

        /// 

 

///Transcoding to Base64

        /// 

 

        /// 

        /// 

        public static String getFileBase64(String fileName)

        {

            FileStream filestream = new FileStream(fileName, FileMode.Open);

            byte[] arr = new byte[filestream.Length];

            filestream.Read(arr, 0, (int)filestream.Length);

            string baser64 = Convert.ToBase64String(arr);

            filestream.Close();

            return baser64;

        }

    }

    class Program

    {

//The access token obtained by calling getaccesstoken() is recommended to set the cache according to the expires in time

//Return token example

        public static String TOKEN = “24.adda70c11b9786206253ddb70affdc46.2592000.1493524354.282335-1234567”;

 

//Baidu cloud opens the API key of the corresponding service application. It is recommended to select multiple services when opening the application

Private static string ClientID = “change to your API key here”;

//Baidu cloud opens the secret key of the corresponding service application

Private static string clientsecret = “change to your secret key here”;

        /// 

 

///Function to get token

        /// 

 

        /// 

        public static String getAccessToken()

        {

            String authHost = “https://aip.baidubce.com/oauth/2.0/token”;

            HttpClient client = new HttpClient();

            List> paraList = new List>();

            paraList.Add(new KeyValuePair(“grant_type”, “client_credentials”));

            paraList.Add(new KeyValuePair(“client_id”, clientId));

            paraList.Add(new KeyValuePair(“client_secret”, clientSecret));

 

            HttpResponseMessage response = client.PostAsync(authHost, new FormUrlEncodedContent(paraList)).Result;

            String result = response.Content.ReadAsStringAsync().Result;

            //Console.WriteLine(result);

//Add your own code

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            string myToken = jo[“access_token”].ToString();

Console. Writeline (“the token obtained is:” + mytoken);

            return myToken;

        }

        

        /// 

 

///Handwriting recognition

        /// 

 

        /// 

        /// 

        /// 

        public static string myHandwriting(string token,string filename)

        {

//String token = token obtained by calling authentication interface;

//Base64 encoding of pictures

            string strbaser64 = FileUtils.getFileBase64(filename); 

            string host = “https://aip.baidubce.com/rest/2.0/ocr/v1/handwriting?access_token=” + token;

            Encoding encoding = Encoding.Default;

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(host);

            request.Method = “post”;

            request.ContentType = “application/x-www-form-urlencoded”;

            request.KeepAlive = true;

//Some parameters are added here

            String str = “recognize_granularity=big&image=” + HttpUtility.UrlEncode(strbaser64);

            byte[] buffer = encoding.GetBytes(str);

            request.ContentLength = buffer.Length;

            request.GetRequestStream().Write(buffer, 0, buffer.Length);

            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

//The display result is garbled code. Try to change the code. After testing, it needs to be changed to utf8 code

            StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);

            string result = reader.ReadToEnd();

Console. Writeline (“handwritten text recognition:”);

            //Console.WriteLine(result);

//Analyzing the code of JSON

            JObject jo = (JObject)JsonConvert.DeserializeObject(result.ToString());

            int num = (int)jo[“words_result_num”];

            string[] words = new string[num];

            for (int i = 0; i 

                words[i] = jo[“words_result”][i][“words”].ToString();

/ / return value

            string txtOCR = null;

            for (int i = 0; i 

                txtOCR += words[i] + “\n”;

//Show results

            Console.WriteLine(txtOCR);

            return txtOCR;

        }

        static void Main(string[] args)

        {

//Change your image path here

String filename = @ “F: \ script 5. JPG”;

            string token = getAccessToken();

            myHandwriting(token,filename);

            Console.Read();

        }

    }

}

Note that in the above code, you need to change your akey and skey, and change the image path. If the returned code is garbled, you need to change the code.

The results are as follows:

The picture files used in the program are as follows:

authorkohakuarc

Recommended Today

Python basics Chinese series tutorial · translation completed

Original: Python basics Python tutorial Protocol: CC by-nc-sa 4.0 Welcome anyone to participate and improve: a person can go very fast, but a group of people can go further. Online reading Apache CN learning resources catalog introduce Seven reasons to learn Python Why Python is great Learn Python introduction Executing Python scripts variable character string […]