Hello, I’m snowball.
The practical project to share with you today is common verification code labeling & recognition. The previous two articles explain the creative inspiration, demand analysis and implementation ideas, data acquisition / preprocessing / character graph cutting and other knowledge of the article. The practical part of Python project – Common verification code labeling and recognition (demand analysis and Implementation ideas), and the practical part of Python project – Common verification code labeling & recognition (data acquisition / preprocessing / character graph cutting), this article is to explain the efficient data annotation.
2、 Efficient data annotation
According to the description of the implementation idea of the first big step, you should understand that in the process of labeling the initial image verification code data, the file name is manually changed for labeling. This method is more effective for labeling a small amount of data, but it is very inefficient and error prone to label hundreds of images. Therefore, the author thinks about how to improve the efficiency of universal picture verification code data labeling, and manage the files, query, modify and download at any time.
At first, I wanted to write a simple tool with Python’s GUI framework. After trying to use Tkinter and pyqt framework APIs, I found that the list component operation was very troublesome and the custom item item was also very complex, so I gave up this scheme. Finally, the current mainstream web page + back-end application scheme is adopted. The author of this back-end technology stack is relatively skilled. The front-end technology is very general. He basically learns and uses it now. He only needs to find a quick front-end and background management template project to change it. Here, I would like to thank my former front-end colleague sister, who recommended me the open source project I started directly, and also helped me solve most of the problems in the modification process. The following is the address of Vue background management system template open source project:
After determining the application scheme, we need to consider the requirements and functional modules of the general picture verification code to be implemented. Due to the length of the article, we don’t write out all the specific function details analysis, design and implementation processes. Here we post the main functions. The following are the relevant functional modules of universal picture verification code data annotation:
1. User module: login / registration, role, permission control, etc
2. File module: add, delete, modify and check the user attachment table, store and drop the disk according to the user directory
3. Verification code picture module: label verification code pull generation, label data submission / modification, paging query, batch download
4. Verification code picture model prediction: support multi model switching prediction (time relationship, only single model is made)
The relevant codes of the front and rear end system of data annotation are not written here. If there is a relevant foundation, you can pull the project link at the beginning of the article to read the relevant module code. Below, only the main related technologies middleware and open source framework for the function realization of the front and rear end system in the project are given.
Front end technical framework:
- Vue, vuex, Vue router, Axios, element UI, etc
Back end technology framework:
Spring boot, spring security, spring MVC, spring data JPA, redis, mysql, etc
It is mentioned here that the back-end adopts the traditional stand-alone Web / session technology architecture, personal projects are barely enough, and enterprise applications can be changed into distributed / micro service architecture.
The above is the general picture verification code. The functions of the front and rear ends of the data annotation system are roughly realized. Readers with problems can leave a message or contact Xiaobian for communication and discussion. After 1-2 weeks’ free time of database design, front and rear end project construction, function coding / testing, the function is preliminarily realized. Next, let’s see the operation effect of the project on windows.
Front end operation effect:
- Using ide: Visual Studio code
- Test environment terminal running: NPM run dev
Back end operation effect:
- Using ide: IntelliJ idea 2019
- Run: click the run button in the toolbar
The following is a demonstration of some operation effects of the front-end system:
Pull picture verification code generation:
Label picture verification code – submit:
Tagged pictures – paging query / download / edit:
Data generated from model prediction (40 characters, 2 wrong ~ ~):
The above is the demonstration effect of the core function. The point to be noted here is that the CNN model prediction and recognition function is a neural network model project that needs to deploy python. The relevant contents of this function deployment will be described in detail in the next step. Here is just the demonstration effect.
I’m snowball. This article mainly explains the efficient data annotation. Before that, we also explained the knowledge of data acquisition / preprocessing / character graph cutting. The practical part of Python project – Common verification code annotation & Recognition (data acquisition / preprocessing / character graph cutting). After the implementation of the above two functions, we can know that the image verification code annotation management can be efficient, Character image segmentation can be carried out in batches, so the basic data are available. Next, we will enter the core function analysis and implementation content of this article: character feature extraction CNN neural network model training.