Python office automation teaches you to use Python to batch identify invoices and enter them into excel tables

Time:2022-5-27

1、 Scene description

Here are four invoices as an example (searched by chenge online). Put the invoice pictures under the pic folder.

Just open an invoice

Extraction target: amount, name, taxpayer identification number and drawer.

Finally, save the four contents of each invoice into Excel:

2、 Prepare the environment

The required libraries are as follows:


from PIL import Image as PI
import pyocr
import pyocr.builders
from cnocr import CnOcr

The installation commands are as follows:


pip install pyocr
pip install cnocr

The invoice contains Chinese content. We need to identify the Chinese in the picture, so cnocr is a good choice.

Tip: in addition to installing the above libraries, you also need to install additional exe files, otherwise the following errors will occur

Exe files to install:

1. ImageMagick

2. tesseract-OCR

The installation process of these two software will not be repeated. You can search and install the tutorial by yourself.

3、 Extract content

Next, take one of the pictures as an example to explain how to extract the target content: amount, name, taxpayer identification number and drawer.

Read picture: PIC / pic1 jpg


tool = pyocr.get_available_tools()[0]
img_url = "pic/pic1.jpg"
with open(img_url, 'rb') as f:
    a = f.read()
new_img = PI.open(io.BytesIO(a))

1. Withdrawal amount

The position where the amount in the invoice needs to be intercepted

##Amount
left = 741
top = 420
right = 850
bottom = 445
image_text1 = new_img.crop((left, top, right, bottom))
#Show pictures
image_text1.show()

The values of left, top, right and bottom here are obtained by modifying the positioning for many times. You can locate according to your invoice content.

Then extract the numbers in the picture

Similarly, continue to extract the following: name

2. Extraction name


left = 155
top = 450
right = 450
bottom = 470
image_obj2 = new_img.crop((left, top, right, bottom))
image_obj2.show()

The name here is in Chinese. We can no longer operate like withdrawing amount (number). You need to use cnocr to take out the Chinese in the picture.


image_obj2.save("tmp.jpg")
ocr = CnOcr()
res = ocr.ocr("tmp.jpg")
print("".join(res[0]))

3. Extract taxpayer identification number

#Taxpayer identification number
left = 155
top = 470
right = 450
bottom = 490
image_text3 = new_img.crop((left, top, right, bottom))
#Show pictures
image_text3.show()


txt3 = tool.image_to_string(image_text3)
print(txt3)

Extract the taxpayer identification number in the picture, and the results are as follows:

4. Drawer


left = 528
top = 550
right = 670
bottom = 600
image_obj4 = new_img.crop((left, top, right, bottom))
image_obj4.show()


image_obj4.save("tmp.jpg")
ocr = CnOcr()
res = ocr.ocr("tmp.jpg")
print("".join(res[0]))

Because there is Chinese, we use cnocr to extract the Chinese in the picture, just like extracting the name.

OK, in this way, we will extract the four target contents in the invoice, then identify all the invoices under the folder pic, and save the contents to excel.

4、 Batch identify invoices and save them to excel

Before reading the picture, first encapsulate the above four operations into functions to facilitate each invoice object to call.

Read all pictures in the folder.


filePath = 'pic'
pic_name = []
for i,j,name in os.walk(filePath):
    pic_name = name
for i in pic_name:
    print(i)

Start recognition and write the results into excel.

for i in pic_name:
    img_url = filePath+"/"+i
    with open(img_url, 'rb') as f:
        a = f.read()
    new_img = PI.open(io.BytesIO(a))
    ##Write CSV
    outws.cell(row=count, column=1, value=text2(new_img))
    outws.cell(row=count, column=2, value=text3(new_img))
    outws.cell(row=count, column=3, value=text1(new_img))
    outws.cell(row=count, column=4, value=text4(new_img))
    count = count + 1
outwb. Save ("invoice Summary - Li Yunchen. XLS") # save the result

Finally save as: Invoice Summary – Li Yunchen Xls, the results are as follows:

5、 Invoice verification

In brother Chen’s communication group, when chatting with a little partner about this content, the little partner suggested adding a function: invoice verification.

Before the above identification (the invoice of your company may not need to be checked), first call the third-party interface to identify the invoice, and then extract the target content in the invoice after passing the identification.

1. Apply for Baidu AI application

2. Get token

# client_ ID is the AK and client obtained on the official website_ Secret is the SK obtained on the official website
host = ' https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id= [AK obtained on the official website] & Client_ Secret = [SK obtained on official website] '
response = requests.get(host)
if response:
    print(response.json()['access_token']

Client here_ ID is the AK and client obtained on the official website_ Secret is the SK obtained on the official website. It can be obtained by applying for an application above

3. Inspection

Let’s take this picture as an example to check

The corresponding invoice types are as follows:

The results are as follows:

I feel that the query result is not very good (not detailed). You can also go to the tax bureau to inquire

4. The tax bureau inquires the invoice

Also take this picture as an example for inspection

Fill in the information and click Check. The results are as follows:

Then the tax bureau will check it more clearly. Readers can choose their own way to check according to their own situation.

6、 Summary

This article basically successfully achieved the objectives and requirements, and the effect is very good! The complete source code can be combined by the code in the article (all of which have been shared in the article). Interested readers can try it by themselves!

Be sure to try * * * *! Be sure to try * * * *! Be sure to try!

Finally, I want to say that in fact, the case of this paper can be applied to other aspects, such as

Batch calculation invoice amount summary

Batch classification by invoice type

……..

This is the end of this article on Python office automation, which teaches you to batch identify invoices and enter them into excel tables with Python. For more information about Python identifying invoices and entering them into excel tables, please search for previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!