1、 Scene description
Here are four invoices as an example (searched by chenge online). Put the invoice pictures under the pic folder.
Just open an invoice
Extraction target: amount, name, taxpayer identification number and drawer.
Finally, save the four contents of each invoice into Excel:
2、 Prepare the environment
The required libraries are as follows:
from PIL import Image as PI import pyocr import pyocr.builders from cnocr import CnOcr
The installation commands are as follows:
pip install pyocr pip install cnocr
The invoice contains Chinese content. We need to identify the Chinese in the picture, so cnocr is a good choice.
Tip: in addition to installing the above libraries, you also need to install additional exe files, otherwise the following errors will occur
Exe files to install:
The installation process of these two software will not be repeated. You can search and install the tutorial by yourself.
3、 Extract content
Next, take one of the pictures as an example to explain how to extract the target content: amount, name, taxpayer identification number and drawer.
Read picture: PIC / pic1 jpg
tool = pyocr.get_available_tools() img_url = "pic/pic1.jpg" with open(img_url, 'rb') as f: a = f.read() new_img = PI.open(io.BytesIO(a))
1. Withdrawal amount
The position where the amount in the invoice needs to be intercepted
##Amount left = 741 top = 420 right = 850 bottom = 445 image_text1 = new_img.crop((left, top, right, bottom)) #Show pictures image_text1.show()
The values of left, top, right and bottom here are obtained by modifying the positioning for many times. You can locate according to your invoice content.
Then extract the numbers in the picture
Similarly, continue to extract the following: name
2. Extraction name
left = 155 top = 450 right = 450 bottom = 470 image_obj2 = new_img.crop((left, top, right, bottom)) image_obj2.show()
The name here is in Chinese. We can no longer operate like withdrawing amount (number). You need to use cnocr to take out the Chinese in the picture.
image_obj2.save("tmp.jpg") ocr = CnOcr() res = ocr.ocr("tmp.jpg") print("".join(res))
3. Extract taxpayer identification number
#Taxpayer identification number left = 155 top = 470 right = 450 bottom = 490 image_text3 = new_img.crop((left, top, right, bottom)) #Show pictures image_text3.show()
txt3 = tool.image_to_string(image_text3) print(txt3)
Extract the taxpayer identification number in the picture, and the results are as follows:
left = 528 top = 550 right = 670 bottom = 600 image_obj4 = new_img.crop((left, top, right, bottom)) image_obj4.show()
image_obj4.save("tmp.jpg") ocr = CnOcr() res = ocr.ocr("tmp.jpg") print("".join(res))
Because there is Chinese, we use cnocr to extract the Chinese in the picture, just like extracting the name.
OK, in this way, we will extract the four target contents in the invoice, then identify all the invoices under the folder pic, and save the contents to excel.
4、 Batch identify invoices and save them to excel
Before reading the picture, first encapsulate the above four operations into functions to facilitate each invoice object to call.
Read all pictures in the folder.
filePath = 'pic' pic_name =  for i,j,name in os.walk(filePath): pic_name = name for i in pic_name: print(i)
Start recognition and write the results into excel.
for i in pic_name: img_url = filePath+"/"+i with open(img_url, 'rb') as f: a = f.read() new_img = PI.open(io.BytesIO(a)) ##Write CSV outws.cell(row=count, column=1, value=text2(new_img)) outws.cell(row=count, column=2, value=text3(new_img)) outws.cell(row=count, column=3, value=text1(new_img)) outws.cell(row=count, column=4, value=text4(new_img)) count = count + 1 outwb. Save ("invoice Summary - Li Yunchen. XLS") # save the result
Finally save as: Invoice Summary – Li Yunchen Xls, the results are as follows:
5、 Invoice verification
In brother Chen’s communication group, when chatting with a little partner about this content, the little partner suggested adding a function: invoice verification.
Before the above identification (the invoice of your company may not need to be checked), first call the third-party interface to identify the invoice, and then extract the target content in the invoice after passing the identification.
1. Apply for Baidu AI application
2. Get token
# client_ ID is the AK and client obtained on the official website_ Secret is the SK obtained on the official website host = ' https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id= [AK obtained on the official website] & Client_ Secret = [SK obtained on official website] ' response = requests.get(host) if response: print(response.json()['access_token']
Client here_ ID is the AK and client obtained on the official website_ Secret is the SK obtained on the official website. It can be obtained by applying for an application above
Let’s take this picture as an example to check
The corresponding invoice types are as follows:
The results are as follows:
I feel that the query result is not very good (not detailed). You can also go to the tax bureau to inquire
4. The tax bureau inquires the invoice
Also take this picture as an example for inspection
Fill in the information and click Check. The results are as follows:
Then the tax bureau will check it more clearly. Readers can choose their own way to check according to their own situation.
This article basically successfully achieved the objectives and requirements, and the effect is very good! The complete source code can be combined by the code in the article (all of which have been shared in the article). Interested readers can try it by themselves!
Be sure to try * * * *! Be sure to try * * * *! Be sure to try!
Finally, I want to say that in fact, the case of this paper can be applied to other aspects, such as
Batch calculation invoice amount summary
Batch classification by invoice type
This is the end of this article on Python office automation, which teaches you to batch identify invoices and enter them into excel tables with Python. For more information about Python identifying invoices and entering them into excel tables, please search for previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!