Python implementation of OCR recognition pyteseract case explanation


Python implementation of OCR recognition: pytesseract

Python often uses pyteseract for character recognition on pictures, that is, OCR recognition. The complete code is relatively simple, as long as the following line, but it is easy to make mistakes in the environment configuration in actual use.

from PIL import Image
import pytesseract
text = pytesseract.image_to_string('/Users/alice/Documents/Develop/PythonCode/textinphoto.PNG'))

Therefore, you need to install the pilot and pyteseract dependency packages before using them.

However, the runtime still reports an error, raise tesseractnotfounderror()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it’s not in your path

The reason is that testseract is not installed, and the error is still prompted after installing testseract with PIP3, as shown in the figure:

alicedembp:~ alice$ pip3 install tesseract
Requirement already satisfied: tesseract in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.1.3)
alicedembp:~ alice$ tesseract
-bash: tesseract: command not found

Unable to use, I found a lot of tutorials to install brew, so I solved it. The steps are as follows:

  • Install brew first

alicedembp:~ alice$ ruby -e "$(curl -fsSL"
  • Then use brew to install leptonica

alicedembp:~ alice$ brew install leptonica
  • Installing Tesseract using brew

alicedembp:~ alice$ brew install tesseract
  • If the installation is successful, check whether it is successful through the command line Tesseract – V. if the version number appears, the installation is successful

alicedembp:~ alice$ tesseract
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]
OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.
Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.
alicedembp:~ alice$ tesseract -v
tesseract 4.0.0
  libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found SSE

Next, you can directly use the following code:

alicedembp:~ alice$ tesseract /Users/alice/Documents/Develop/PythonCode/textinphoto.png /Users/alice/Documents/Develop/PythonCode/output.txt

Open the picture of textinphoto.png and output the text to output.txt. The picture is as follows

The operation is successful, and the output.txt document is generated. The text in the document is the text recognized in the picture.

This is the end of this article on the detailed explanation of the case of Python implementing OCR recognition. For more information about Python OCR recognition, please search the previous articles of developpaer or continue to browse the relevant articles below. I hope you will support developpaer in the future!

Recommended Today

SQL exercise 20 – Modeling & Reporting

This blog is used to review and sort out the common topic modeling architecture, analysis oriented architecture and integration topic reports in data warehouse. I have uploaded these reports to GitHub. If you are interested, you can have a lookAddress: recorded a relatively complete development process in my hexo blog deployed on GitHub. You can […]