Installation and operation instructions of Tesseract OCR

Time:2021-7-6

When learning image recognition technology OCR, I feel it is useful to recognize the text in the picture through Tesseract, so I record it

The installation file and language pack have been downloaded. Just follow the installation steps below

Download address
https://digi.bib.uni-mannheim…

install
1. Click the file tesseract-ocr-setup-4.00.00dev.exe and install it according to the prompt. After the installation is successful, the following figure is displayed:
Installation and operation instructions of Tesseract OCR

Copy your installation path. My installation path is D:: (Python) Tesseract OCR. The interface is as follows:

Installation and operation instructions of Tesseract OCR

route
Open my computer system properties > Advanced > environment variables
 Installation and operation instructions of Tesseract OCR

2. Put the downloaded word library in the testsdata folder of Tesseract OCR project.
distinguish
Enter CMD, enter the path of the image to be recognized, e: * * \ Tesseract image recognition (according to the position of the image), enter the command
 
The name Library of the result file generated by the name of the Tesseract image
For example, my image recognition is:
 
tesseract test.png result -l chi_sim
Error 1
When installing Tesseract OCR, the default path was changed and the following error occurred when executing the command:
 Installation and operation instructions of Tesseract OCR

Add a tessdata_ Prefix variable name, and the variable value is the installation path of my language font folder. Add it to the variable as F: / / Tesseract OCR / testsdata; As shown in the figure below:
Installation and operation instructions of Tesseract OCR

Error 2
When there is no corresponding font in the font, the following error will be prompted:
 Installation and operation instructions of Tesseract OCR

Recognition results
Example 1: first take a relatively simple picture to identify
 Installation and operation instructions of Tesseract OCR

Switch to the picture directory and execute the following command line in the CMD window:

Installation and operation instructions of Tesseract OCR

 
tesseract test1.png result -l chi_sim
-l chi_ Sim means to use the simplified Chinese font library (you need to download the Chinese font library file, unzip it and store it in the testsdata directory. The extension of the font library file is. Raineddata, and the file name of the simplified Chinese font library is: Chi_ sim.traineddata)
 

Open result.txt in the picture directory

Installation and operation instructions of Tesseract OCR

Recommended Today

"About operating page B to refresh data on page A"

Here is Zhuo, who is suddenly diligent, and continues to update the documentation. need "Little Z, can you say that after adding a new piece of data on page A, page B is also refreshed synchronously, and the newly added data appears?" PO said to me. "Yes, yes, but…" I was about to say. "I […]