Search as you see, 3 minutes to teach you to build a clothing search system!

Time:2021-4-28

Abstract:The clothing search system is built based on fashion MNIST dataset with mindspre + Jina.

introduction

Do you algorithm budding newcomers often train the model but don’t know how to deploy and apply it? Or can we just tune parameters but not front-end and back-end, so we can’t explain to the boss what this model can do? If there is a very simple way that you can build a search system supported by deep learning in three minutes, and show it on the front end to all bosses? Do you want to try? This article is from Dr. Xiao Han, a member of mindsprore Community Technology Governance Committee (TSC), the founder of Jina, which uses mindspre + Jina to build a clothing search system based on fashion MNIST dataset.

[contents of this article]

  • How to do it with Jina?
  • How does Jina’s Hello world work?
  • How to use mindspore + Jina to build a search system?
  • Create a mindspree executor
  • Modify the encoder and network code of mindsprore
  • Write a unit test
  • Prepare dockerfile
  • The last step: finally can build!
  • Come and see the finished product of mindsprore!
  • summary

Programmers (girlfriends) who like to visit Taobao or other Taohai websites, when you browse clothes, do you often see the clothes on models, all of them! All! Yes! Yes! But I don’t know where to buy it, what’s the article number? Even if you know the product number from the major wear bloggers, you don’t bother to search one by one. Now, it doesn’t need to be so troublesome. As long as you spend 3 minutes to set up this clothing search system, when your girlfriend sees the clothes on the model again, she can search for the most similar clothes. Isn’t it great!

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 1 shop the look

Before we do this, let’s take a look at the two frameworks we need to use today:Mindspree and Jina

  • Mindspore is an open source deep learning framework of Huawei on March 28, 2020. It can support its own ascend chip and greatly improve its performance!
  • Jina is a cloud neural search framework driven by the most advanced AI and deep learning, which can realize any type of large-scale index and query on multiple platforms and architectures. Whether you search for pictures, videos or audio clips, Jina can handle them.

Search as you see, 3 minutes to teach you to build a clothing search system!

The dataset used here is fashion MNIST dataset. It contains 70000 images, of which 60000 are training sets and 10000 are test sets. Each image is 28×28 gray image, a total of 10 categories. Now let’s officially start!

How to do it with Jina?

First of all, you need a computer to confirm whether the environment is OK

  • Mac OS or Linux
  • Python 3.7, 3.8
  • Docker

Then execute the following command:

pip install jina && jina hello-world

Or use docker directly:

docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html  # replace "open" with "xdg-open" on Linux

Now start running the program, and you can see the running results

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 3 running results of Jina Hello World

Isn’t it amazing? So how does Jina do it? You can spend one minute to understand the ten basic components of Jina. The three most important information in this article are:

  • YAML config: allows users to customize the properties that describe objects.
  • Executor: represents the algorithm unit in Jina. For example, algorithms such as encoding images into vectors and sorting results can be expressed by executor. We can use craft to make / split and transform the content to be searched, then use encoder to express the search object as vector, then use indexer to save and retrieve the vector and key value information, and finally use ranker to sort the search results.
  • Flow: represents a high-level task, such as index, search, and train, which all belong to a flow.

How does Jina’s Hello world work?

Want to know the details of how Hello world works? In fact, in Hello world, we use yaml files to describe the flow of index and search. We can import yaml files and visualize them with the. Plot() command

from pkg_resources import resource_filename
from jina.flow import Flow

f = Flow.load_config(resource_filename('jina', '/'.join(('resources', 'helloworld.flow.index.yml')))).plot()

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 4 flow chart of Hello world yaml file

How is the information in yaml file represented as a graph? Here’s an intuitive comparison:

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 5 yaml file information

In fact, there are two steps in this flow (also called two pods in Jina): the first step is to feed the data to the encoder in parallel, and the output vector and meta information are stored in the indexer in pieces. The query flow runs in the same way, with only minor changes in parameters.

Since the principle is so simple, if we train our own model, can we replace it? Now let’s teach you how to use mindspre + Jina to build a clothing search system in just four steps.

How to use mindspore + Jina to build a search system?

Create a mindspree executor

There are many deep learning models in mindsprore’s modelzoo. This paper uses the most classic CV network: lenet. We can create a new mindspree executor through Jina hub. The version of Jina hub used in this paper is v0.7. You can enter the following command to install it:

pip install "jina[hub]"

After installation, if you want to create a new executor, you can directly enter:

jina hub new

After executing this command, a guidance command will pop up. Just input it according to the following requirements. Some settings can be set directly by default. Just press enter directly.

More important are the following commands:

Search as you see, 3 minutes to teach you to build a clothing search system!

After all the commands are entered, you will see that the mindsprelenet folder has been created successfully. Then download the lenet code base of mindsprore and the training data of fashion MNIST, and put them under the mindsprorelenet module in the following way:

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 7 code structure of mindsprelenet

Modify the encoder and network structure code of mindsprore

  • 1. Modification__ init__. py

It’s primitive__ init__. Py code, there is a basic class baseencoder, we need to change the way of encode, turn it into basemindsporeencoder.

from jina.executors.encoders import BaseEncoder

class MindsporeLeNet(BaseEncoder):
    """
    :class:`MindsporeLeNet` What does this executor do?.
    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # your customized __init__ below
        raise NotImplementedError

    def encode(self, data, *args, **kwargs):
        raise NotImplementedError

Basemindsporeencoder is an abstract class in Jina, which is defined in__ init__ The checkpoint of mindspore model is imported into the constructor. In addition, it can provide the attribute interface of mindspree model through self. Model. The following table shows the classes that mindsporelenet inherits through the constructor.

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 8 inherited classes in mindsprelenet

After modification, it is as follows:

from jina.executors.encoders.frameworks import BaseMindsporeEncoder

class MindsporeLeNet(BaseMindsporeEncoder):
    """
    :class:`MindsporeLeNet` Encoding image into vectors using mindspore.
    """

    def encode(self, data, *args, **kwargs):
        # do something with `self.model`
        raise NotImplementedError

    def get_cell(self):
        raise NotImplementedError
  • 2. Execute encode() method.

Given a stack of image data with batch size B (represented by the ndarray of numpy, shape is [b, h, w]), encode () converts the image data into vector embeddings (shape is [b, D]). After importing mindspree lenet model through self. Model, we can transform it through self. Model (tensor (data)). Asnumpy().

be careful: the input shape of self.model is very error prone. The input of the original lenet model is a three channel image, and the shape is 32×32, so the input must be [b, 3, 32, 32]. However, fashion MNIST is a grayscale image with a single channel, and the shape of the image is 28×28, so we can either adjust the size of the image or fill in zeros for the image. Here we use a simple zero operation. The final encode() function is as follows:

def encode(self, data, *args, **kwargs):
    # LeNet only accepts BCHW format where H=W=32
    # hence we need to do some simple padding
    data = numpy.pad(data.reshape([-1, 1, 28, 28]),
                  [(0, 0), (0, 0), (0, 4), (0, 4)]).astype('float32')
    return self.model(Tensor(data)).asnumpy()
  • 3. Execution_ Cell () method.

In mindspore, we usually call the layer of neural network “cell”, which can be a separate neural network layer (such as conv2d, relu, batch)_ norm)。 To get the embedding of vectors, we just need to remove the classification head from lenet (for example, the last softmax layer). This is a good implementation. You just need to inherit from the original lenet5 class and rewrite the construct() function.

def get_cell(self):
    from .lenet.src.lenet import LeNet5

    class LeNet5Embed(LeNet5):
        def construct(self, x):
            x = self.conv1(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.conv2(x)
            x = self.relu(x)
            x = self.max_pool2d(x)
            x = self.flatten(x)
            x = self.fc1(x)
            x = self.relu(x)
            x = self.fc2(x)
            x = self.relu(x)
            return x

    return LeNet5Embed()

Write a unit test

When you are creating a Jina executor, you must not forget to write unit tests. If there is no unit test in the executor, you cannot create it through Jina hub API~

In this example, a test template has been generated. You can find test in the tests folder_ Mindsporelenet.py file. First check whether mindsprore is running, and if it can, see if the output shape is what we want.

import numpy as np

from .. import MindsporeLeNet


def test_mindsporelenet():
    """here is my test code

    https://docs.pytest.org/en/stable/getting-started.html#create-your-first-test
    """
    mln = MindsporeLeNet(model_path='lenet/ckpt/checkpoint_lenet-1_1875.ckpt')
    tmp = np.random.random([4, 28 * 28])

    # The sixth layer is a fully connected layer (F6) with 84 units.
    # it is the last layer before the output
    assert mln.encode(tmp).shape == (4, 84)

Prepare dockerfile

The preparation work of Python level has been completed. Let’s prepare docker image. We can create it based on the existing dockerfile. We only need to add a line of code running train. Py to generate the checkpoint file.

FROM mindspore/mindspore-cpu:1.0.0

# setup the workspace
COPY . /workspace
WORKDIR /workspace

# install the third-party requirements
RUN pip install --user -r requirements.txt

+ RUN cd lenet && 
+    python train.py --data_path data/fashion/ --ckpt_path ckpt --device_target="CPU" && 
+    cd -

# for testing the image
RUN pip install --user pytest && pytest -s

ENTRYPOINT ["jina", "pod", "--uses", "config.yml"]

This line uses train. Py in the mindspree lenet code library to generate the training checkpoint. We will use this checkpoint during testing and deployment. In the config.yml file, you need to put the checkpoint file address in the model_ Path. Requests. On defines how mindsprorelenet should be executed under the requests of index and search. It doesn’t matter if you don’t understand the above contents. In fact, they are all copied and changed from helloworld.encoder.yml.

!MindsporeLeNet
with:
  model_path: lenet/ckpt/checkpoint_lenet-1_1875.ckpt
metas:
  py_modules: 
    - __init__.py
    # - You can put more dependencies here
requests:
  on:
    [IndexRequest, SearchRequest]:
      - !Blob2PngURI {}
      - !EncodeDriver {}
      - !ExcludeQL
        with:
          fields:
            - buffer
            - chunks

The last step: finally can build!

Finally, we can build mindsporelenet into a docker image!! Execute the following command:

jina hub build MindsporeLeNet/ --pull --test-uses
  • –Pull: when your image data set is not local, this command will tell the hub builder to download the data set
  • –Test uses: add an additional test to check whether the created image can be successfully run through the Jina flow API.

Now the terminal has started to print logs. If it takes too long, you can add epoch in mindsporelenet / lenet / SRC / config.py_ Turn it down.

The last successful message:

[email protected][I]:Successfully built cfa38dcfc1f9
[email protected][I]:Successfully tagged jinahub/pod.encoder.mindsporelenet:0.0.1
[email protected][I]:building MindsporeLeNet/ takes 57 seconds (57.86s)
[email protected][S]: built jinahub/pod.encoder.mindsporelenet:0.0.1 (sha256:cfa38dcfc1) uncompressed size: 1.1 GB

Now you can use it as a pod with the following command:

jina pod --uses jinahub/pod.encoder.mindsporelenet:0.0.1

Compared with Jina pod — uses abc.yml, we will find jinahub / pod.encoder.minutes porelenet:0.0.1 There is a docker container at the beginning of the log information for. These logs are transmitted from the container of docker to the host. The specific differences between the two are described below.

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 9 Comparison of differences

Of course, you can also upload this image to the docker warehouse:

jina hub build MindsporeLeNet/ --pull --test-uses --repository YOUR_NAMESPACE --push

Come and see the finished product of mindsprore!

Finally, use the newly created mindspree executor directly in the flow of index and search. It’s very simple. Just replace pods. Encode. Uses

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 10 yaml file differences between index and query

The parameters of Jina Hello world can be customized. Just specify the yaml files of index and query that we have just written, and enter the following command:

jina hello-world --uses-index helloworld.flow.index.yml --uses-query helloworld.flow.query.yml

Ha ha, it’s done! In a few minutes, you can see the query results displayed in the first animation!

Search as you see, 3 minutes to teach you to build a clothing search system!

Figure 11 final output

summary

In this paper, mindspre + Jina is used to build a clothing search system. The code is very simple. In fact, as long as you learn to modify the code of encode, build the network layer according to your needs, and then package it into the image of docker and modify the yaml file, you can use Jina to achieve the final display effect. In this way, you can modify a small amount of code according to your own needs, You can build a search system based on mindspore by yourself, isn’t it very simple? Interested students can directly click the following link to run it directly:https://gitee.com/mindspore/c…

This article is shared from Huawei cloud community “3 minutes to teach you how to build a clothing search system with mindspore and Jina!”, Author: Cheng Xiaoli.

Click follow to learn about Huawei’s new cloud technology for the first time~