Win10 installs tesserocr to configure Python to recognize alphanumeric captcha with tesserocr

Time:2020-10-27

Link: https://pan.baidu.com/s/1l2yiba7ZTPUTf41ZnJ4PYw
Extraction code: t3bq

Win10 install tesserocr

First, you need to download testeract, which provides the underlying support for tesserocr. Specific download official path: https://github.com/UB-Mannheim/tesseract/wiki , select the corresponding system version. You can select a stable version without dev to download, such as tesseract-ocr-setup-3.05.02-20180621.exe. Then install it all the way. You can only remember to check additional language data (download) and check the languages that may be used, such as simplified Chinese, traditional Chinese, mathematical modules, etc. you don’t need to select all of them. The download time of tessdata will be longer.

在这里插入图片描述

Download time will be longer, patience can wait, conditional can cross the wall, download speed will be much faster

Install the testerocr library corresponding to python

Install using the original WHL file. Download the official WHL file of testerocr: https://github.com/simonflueckiger/tesserocr-windows_ Build / releases. Download the WHL file corresponding to the local environment. For example, my windows 64 bit system and python version are 3.5. After downloading, use CD to jump to the directory where the WHL file is located, and then execute “PIP install testerocr-2.2.2-cp35-cp35m-win_ AMD64. WHL “, which is easy to install.

The following module is used to replace the Unicode decodeerror system problem

pytesseract :pip install pytesseract

pip install pytesseract

The first run is always not smooth. I believe that most people will encounter the pit I encountered. Most of the mistakes are similar to:

Traceback (most recent call last): File "G:\pythonSources\my12306/obtain_message\test.py", line 4, in <module>
 print(tesserocr.image_to_text(image))
 File "tesserocr.pyx", line 2400, in tesserocr._tesserocr.image_to_text
Runtimeerror: failed to init API, possibly an invalid testdata path: "a local path"

A relatively simple and crude solution is to copy the tessract OCR folder of Tesseract OCR to the path of the prompt, which is effective for pro testing.

Test code


import tesserocr
from PIL import Image
image=Image.open('image.jpg')
print(tesserocr.image_to_text(image))

summary

The above is the introduction of win10 installation tesserocr configuration Python using tesserocr to identify alphanumeric verification code, I hope to help you, if you have any questions, please leave me a message, Xiaobian will reply you in time. Thank you very much for your support to the developeppaer website!
If you think this article is helpful to you, welcome to reprint, please indicate the source, thank you!