Python implementation of OCR recognition: pytesseract
Python often uses pyteseract for character recognition on pictures, that is, OCR recognition. The complete code is relatively simple, as long as the following line, but it is easy to make mistakes in the environment configuration in actual use.
from PIL import Image import pytesseract text = pytesseract.image_to_string(Image.open('/Users/alice/Documents/Develop/PythonCode/textinphoto.PNG')) print(text)
Therefore, you need to install the pilot and pyteseract dependency packages before using them.
However, the runtime still reports an error, raise tesseractnotfounderror()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it’s not in your path
The reason is that testseract is not installed, and the error is still prompted after installing testseract with PIP3, as shown in the figure:
alicedembp:~ alice$ pip3 install tesseract Requirement already satisfied: tesseract in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.1.3) alicedembp:~ alice$ tesseract -bash: tesseract: command not found
Unable to use, I found a lot of tutorials to install brew, so I solved it. The steps are as follows:
- Install brew first
alicedembp:~ alice$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- Then use brew to install leptonica
alicedembp:~ alice$ brew install leptonica
- Installing Tesseract using brew
alicedembp:~ alice$ brew install tesseract
- If the installation is successful, check whether it is successful through the command line Tesseract – V. if the version number appears, the installation is successful
alicedembp:~ alice$ tesseract Usage: tesseract --help | --help-extra | --version tesseract --list-langs tesseract imagename outputbase [options...] [configfile...] OCR options: -l LANG[+LANG] Specify language(s) used for OCR. NOTE: These options must occur before any configfile. Single options: --help Show this help message. --help-extra Show extra help for advanced users. --version Show version information. --list-langs List available languages for tesseract engine. alicedembp:~ alice$ tesseract -v tesseract 4.0.0 leptonica-1.78.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.1 Found AVX2 Found AVX Found SSE
Next, you can directly use the following code:
alicedembp:~ alice$ tesseract /Users/alice/Documents/Develop/PythonCode/textinphoto.png /Users/alice/Documents/Develop/PythonCode/output.txt
Open the picture of textinphoto.png and output the text to output.txt. The picture is as follows
The operation is successful, and the output.txt document is generated. The text in the document is the text recognized in the picture.
This is the end of this article on the detailed explanation of the case of Python implementing OCR recognition. For more information about Python OCR recognition, please search the previous articles of developpaer or continue to browse the relevant articles below. I hope you will support developpaer in the future!