Woman, man, camera, TV: how to make a complete deep learning application

Time:2021-4-8

Author:Leancloud Jianghong

Some time ago Trump’sThis interviewWhen I became the focus of social media, I was just reviewing some materials about the neural network, so I thought that I could use some new open source tools to make a complete application to identify woman, man, camera and TV. This example is small enough to be completed in a short time, which is very suitable for explaining how to make a complete deep learning application. The completed application is deployed inhttps://trump-sim.jishuq.comLeanCloudOne of themCloud engineExamples).

There are three steps to do this application: first complete the training of the model with some pictures, then export the model, make a back-end API to identify pictures, and then make a front-end API to upload pictures and display results.

Prepare training data

Jupyter notebook is a very popular interactive environment for data analysis and machine learning. It can put markdown documents and Python code in a notebook, and can also display the running results of code in friendly ways such as charts and pictures. Fastai is also used here. It is an open source library based on pytorch, which provides a lot of convenient interfaces for network and file batch operation. This article is written in jupyter notebook, so you can directly clone this repo, install dependencies, and start jupyter notebook.

git clone https://github.com/hjiang/trump-sim-notebook
pip install -r requirements.txt
jupyter notebook

We’ll use it againBing image search APITo get the training pictures, you need to register yourself and apply for a free API key. Of course, because the images are found on many third-party websites, you need to be able to access websites outside China without obstacles. ‍ ♂️

Put your Bing Image Search API key in the project directory.envIn order to avoid leaking out in the code:

BING_SEARCH_API_KEY=XXXXXXXX....

And read it in Python

import os
from dotenv import load_dotenv
load_dotenv()
key = os.getenv('BING_SEARCH_API_KEY')

Write a function to search for images:

from azure.cognitiveservices.search.imagesearch import ImageSearchClient
from msrest.authentication import CognitiveServicesCredentials
from fastcore.foundation import L

def search_images_bing(key, term, min_sz=128):
    client = ImageSearchClient('https://api.cognitive.microsoft.com', CognitiveServicesCredentials(key))
    return L(client.images.search(query=term, count=150, min_height=min_sz, min_width=min_sz).value)

To verify this, search an image of Artemis

from torchvision.datasets.utils import download_url
from PIL import Image
import fastai2.vision.widgets

results = search_images_bing(key, 'Artemis')
urls = results.attrgot('content_url')
download_url(urls[0], 'images/', 'artemis.jpg')
image = Image.open('images/artemis.jpg')
image.to_thumb(128, 128)

Woman, man, camera, TV: how to make a complete deep learning application

After confirming that there is no problem in downloading the pictures, we download the four kinds of pictures we care about to/objectsThere are four directories below.

from fastai2.vision.utils import download_images
from pathlib import Path

object_types = 'woman','man','camera', 'TV'
path = Path('objects')

if not path.exists():
    path.mkdir()
    for o in object_types:
        dest = (path/o)
        dest.mkdir(exist_ok=True)
        results = search_images_bing(key, o)
        download_images(dest, urls=results.attrgot('content_url'))

You may see some pictures download failure information, as long as not too many can be ignored. Some pictures on the network are damaged or in a format not supported by Python image library, so they need to be deleted.

from fastai2.vision.utils import get_image_files
from fastai2.vision.utils import verify_images

fns = get_image_files(path)
failed = verify_images(fns)
failed.map(Path.unlink);

Pretreatment

Before training, you need to tell fastai how to label images and load them into its data structure. The following code does the following:

  • Use parent directory name(parent_label)To mark each image.
  • 20% of the images are reserved as validation set and the rest as training set. The training set is used to train the neural network data, and the verification set is used to measure the accuracy of the trained model when it meets new data. The two sets cannot overlap.
  • Reduce the size of the picture to improve efficiency

The last line of code shows the first three images of the validation set.

from fastai2.data.block import DataBlock, CategoryBlock
from fastai2.vision.data import ImageBlock
from fastai2.data.transforms import RandomSplitter, parent_label
from fastai2.vision.augment import Resize

objects = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

dls = objects.dataloaders(path)
dls.valid.show_batch(max_n=3, nrows=1)

Woman, man, camera, TV: how to make a complete deep learning application

When doing image recognition, we often do some random scaling, clipping and other transformation on the image, in order to produce enough data to improve the training effect. You can see the results of different transformations for the same image from the results of the following code.

from fastai2.vision.augment import aug_transforms, RandomResizedCrop

objects = objects.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms())
dls = objects.dataloaders(path)
dls.train.show_batch(max_n=6, nrows=2, unique=True)

Woman, man, camera, TV: how to make a complete deep learning application

Training data

Then we can finally start training. For application scenarios such as image recognition, a new model is often not trained from scratch, because there are a large number of features that almost all applications need to recognize, such as the edge of the object, shadows, patterns formed by different colors and so on. It’s usually based on a pre trained model (such as the one here)resnet18)And train the last layers with their own new data (the term is fine tune). In a multi-layer neural network, the more forward (close to input) layer is responsible for identifying the more specific features, while the more backward layer is responsible for identifying the more abstract features and closer to the purpose. The last line below refers to four rounds of training.

If you have NVIDIA’s graphics card, under Linux, and install the appropriate driver, the following code only takes a few seconds to ten seconds, otherwise you will have to wait a few minutes.

from fastai2.vision.learner import cnn_learner
from torchvision.models.resnet import resnet18
from fastai2.metrics import error_rate
import fastai2.vision.all as fa_vision

learner = cnn_learner(dls, resnet18, metrics=error_rate)
learner.fine_tune(4)
epoch train_loss valid_loss error_rate time
0 1.928001 0.602853 0.163793 01:16
epoch train_loss valid_loss error_rate time
0 0.550757 0.411835 0.120690 01:42
1 0.463925 0.363945 0.103448 01:46
2 0.372551 0.336122 0.094828 01:44
3 0.314597 0.321349 0.094828 01:44

In the final output table are the loss of the training set, the loss of the verification set, and the error rate in each round. The error rate is the index that we care about, and loss is the index that controls the training process (the goal of training is to make loss closer and closer to zero). The reason why we need these two different indicators is that loss has to satisfy some conditions that the error rate does not necessarily satisfy. For example, it is differentiable for all parameters, and the error rate is not a continuous function. The lower the loss, the lower the error rate, but there is no linear relationship between them. The error rate here is about 10%, that is, the accuracy rate is about 90%.

Next, we need to see which pictures in the verification set are wrongly identified. The following code will print out the conflict matrix. In this matrix, the number of diagonals is the number of correctly identified pictures, and the number of wrongly identified pictures in other places.

from fastai2.interpret import ClassificationInterpretation

interp = ClassificationInterpretation.from_learner(learner)
interp.plot_confusion_matrix()

Woman, man, camera, TV: how to make a complete deep learning application

From the output matrix, we can see that there are a total of 11 errors, including 4 gender errors. In addition, there are many confusions between TV and other types.

Now let’s show the picture with the highest loss to see what’s wrong.

interp.plot_top_losses(12, nrows=4)

Woman, man, camera, TV: how to make a complete deep learning application

The output results reflect the typical problem of the data captured from the Internet: too much noise. For example, TV search results include TV remote control, TV box, TV play poster, and some irrelevant results.

Fastai provides a cleaner, which can help us do manual cleaning for smaller data sets. It can list the images with the highest loss in the whole dataset, so that users can manually modify the tags or delete them.

from fastai2.vision.widgets import ImageClassifierCleaner

cleaner = ImageClassifierCleaner(learner)
cleaner

Note that cleaner is just marking, you need Python code to do the actual processing. I usually mark the pictures with problems as delete and then delete them.

for idx in cleaner.delete(): cleaner.fns[idx].unlink()

After cleaning up, repeat the training process.

objects = DataBlock(
    blocks=(ImageBlock, CategoryBlock),
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

objects = objects.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms())
dls = objects.dataloaders(path)
learner = cnn_learner(dls, resnet18, metrics=error_rate)
learner.fine_tune(3)
epoch train_loss valid_loss error_rate time
0 1.663555 0.510397 0.201835 01:11
epoch train_loss valid_loss error_rate time
0 0.458212 0.226866 0.091743 01:32
1 0.358364 0.145286 0.036697 01:31
2 0.281517 0.146477 0.036697 01:32

If you noticeerror_rateIf there is an increase in the epoch at the back, it can be reducedfine_tuneIn order to achieve the best effect. Because if there are too many training rounds, the model will over fit the training set, and the error rate will be higher when encountering new data. From the above output, we can see that the accuracy is improved to more than 96%.

After achieving satisfactory accuracy, the model can be exported and used online. The following line will save the model toexport.pkl

learner.export()

Back end API

The back-end API is the simplest part of this project, with only one endpoint. Load the previously exported model, and use the model to predict the classification when new images are received.

trump = load_learner('model.pkl')

@app.route('/api/1.0/classify-image', methods=['POST'])
def classify():
    image = request.files['image']
    res = trump.predict(image.read())
    response = jsonify({'result': res[0]})
    response.status_code = 200
    return response

Complete code inGitHubIt’s on. according tofileJust deploy to the leancloud cloud engine.

Front end website

The front-end is also relatively simple. It only needs a page for users to upload photos, reduce the photos in the browser, and then send them to the back-end API. Complete react project inGitHubThe main code isApp.js. Due to the limited space, I will not explain it in detail, only attach a screenshot of the operation:

Woman, man, camera, TV: how to make a complete deep learning application

Assignments for readers

You may have noticed that the above back-end API service is stateless and does not store any data, so the identification process can be completed in the front end. If you are interested, you can investigate how to convert pytorch model into JavaScript model, and try to recognize photos directly in the browser. In real applications, this way can perfectly protect user privacy because it does not need to transmit any data to the server, which is also the direction of on device machine learning promoted by apple.

Image recognition is the simplest problem that can be solved by machine learning, because there are many ready-made results that can be reused, and new applications can achieve better results even with a small amount of training data. There are many other types of problems that are not so easy to get satisfactory results. Leancloud is currently developing new products in machine learning to help developers more easily explore the value of data. If you’re interested in this, you can follow usmicro-blogThe official account of WeChat.TwitterOrRegister as a leancloud user. In the near future, we will publish more information and invite some users to try out the new product.

Title MapCharles Deluvio