From 0 to 1, use openppl to implement an AI reasoning application

Time:2022-6-1

The deep learning reasoning framework openppl has been open source. This paper uses an image classification example to explain how to deploy a deep learning model and complete an AI reasoning application from 0 to 1.

Final effect: identify the animals in the picture by uploading a cat photo (dogs can also be used)

background knowledge

Openppl is a reasoning engine based on a self-developed high-performance operator library. It provides the ability to deploy AI models at multiple back ends in the cloud native environment, and supports the efficient deployment of deep learning models such as openmmlab.

Openppl source code link:https://github.com/openppl-pu…

install

1. download the pplnn source code

git clone https://github.com/openppl-public/ppl.nn.git

2. installation dependency

Pplnn compilation depends on the following:

  • GCC > = 4.9 or llvm/clang > = 6.0
  • CMake >= 3.14
  • Git >= 2.7.0

The image classification routine classification described in this article also requires additional opencv installation:

  • For apt package management system (such as ubuntu/debian):
    sudo apt install libopencv-dev
  • For the yum package management system (such as CentOS):
    sudo yum install opencv opencv-devel
  • Or install opencv from the source code

Note: the compiler will automatically detect whether opencv is installed. If it is not installed, the routine in this article will not be generated

3. compilation

  • X86
cd ppl.nn
    / Build sh -DHPCC_ USE_ Openmp=on \dhpcc can be omitted if multithreading is not enabled_ USE_ OpenMP options
  • CUDA
    cd ppl.nn
    ./build.sh -DHPCC_USE_CUDA=ON

After compilation, the image classification routine classification will be generated in pplnn build / samples / CPP / run_ In the model/ directory, you can read pictures and model files and output classification results.

For more descriptions about compilation, see:building-from-source.md


Explanation of image classification routine

The source code of image classification routines is in samples/cpp/run_ model/classification. In CPP, this section will explain the main parts.

1. image preprocessing

The data format read by opencv is BGR HWC uint8 format, while the input format required by onnx model is RGB nchw fp32, which requires image data conversion:

int32_t ImagePreprocess(const Mat& src_img, float* in_data) {
    const int32_t height = src_img.rows;
    const int32_t width = src_img.cols;
    const int32_t channels = src_img.channels();

    //Convert color space from bgr/gray to RGB
    Mat rgb_img;
    if (channels == 3) {
        cvtColor(src_img, rgb_img, COLOR_BGR2RGB);
    } else if (channels == 1) {
        cvtColor(src_img, rgb_img, COLOR_GRAY2RGB);
    } else {
        fprintf(stderr, "unsupported channel num: %d\n", channels);
        return -1;
    }

    //Separate the three channels of HWC format
    vector<Mat> rgb_channels(3);
    split(rgb_img, rgb_channels);

    //When constructing cv:: mat here, use in directly_ Data provides a data space for cv:: mat. In this way, when cv:: mat changes, the data will be written directly to in_ In data
    Mat r_channel_fp32(height, width, CV_32FC1, in_data + 0 * height * width);
    Mat g_channel_fp32(height, width, CV_32FC1, in_data + 1 * height * width);
    Mat b_channel_fp32(height, width, CV_32FC1, in_data + 2 * height * width);
    vector<Mat> rgb_channels_fp32{r_channel_fp32, g_channel_fp32, b_channel_fp32};

    //Convert uint8 data to fp32 and subtract the mean divided by the standard deviation, y = (x - mean) / STD
    const float mean[3] = {0, 0, 0}; //  Adjust the mean and variance according to the data set and training parameters
    const float std[3] = {255.0f, 255.0f, 255.0f};
    for (uint32_t i = 0; i < rgb_channels.size(); ++i) {
        rgb_channels[i].convertTo(rgb_channels_fp32[i], CV_32FC1, 1.0f / std[i], -mean[i] / std[i]);
    }

    return 0;
}

2. generate runtime builder from onnx model

First, you need to create and register the engines you want to use. Each engine corresponds to a reasoning backend. Currently, x86 and CUDA are supported.

Create x86 engine:

    auto x86_engine = X86EngineFactory::Create();

Or CUDA engine:

    auto cuda_engine = CudaEngineFactory::Create(CudaEngineOptions());

The following example uses x86 engine only:

//Register all engines you want to use
    vector<unique_ptr<Engine>> engines;
    engines.emplace_back(unique_ptr<Engine>(x86_engine));

Then use the onnxruntimebuilderfactory:: create() function to read in the onnx model and create the runtime builder according to the registered engine:

    vector<Engine*> engine_ptrs;
    engine_ptrs.emplace_back(engines[0].get());
    auto builder = unique_ptr<ONNXRuntimeBuilder>(
        ONNXRuntimeBuilderFactory::Create(ONNX_model_path, engine_ptrs.data(), engine_ptrs.size()));

Supplementary note: the pplnn framework supports mixed reasoning of multiple heterogeneous devices. A variety of different engines can be registered. The framework will automatically split the calculation graph into multiple sub graphs and schedule different engines for calculation.

3. create runtime

Using runtime_ Options configure the runtime options, such as mm_ Policy field to MM_ LESS_ Memory (memory saving mode):

RuntimeOptions runtime_options;
    runtime_ Options Mm_ policy = MM_ LESS_ MEMORY; //  Use memory saving mode

Create a runtime instance using the runtime builder generated in the previous step:

    unique_ptr<Runtime> runtime;
    runtime.reset(builder->CreateRuntime(runtime_options));

A runtime builder can create multiple runtime instances. These runtime instances share constant data (weights, etc.) and network topology, thereby saving memory overhead.

4. set network input data

First, get the input tensor of the runtime through the getinputtensor() interface:

auto input_ tensor = runtime->GetInputTensor(0); //  Classification network has only one input

Reshape input tensor and reallocate the memory of tensor:

const std::vector<int64_t> input_shape{1, channels, height, width};
    input_ tensor->GetShape(). Reshape(input_shape); //  Even if the input size is fixed in the onnx model, pplnn will dynamically adjust the input size
    auto status = input_ tensor->ReallocBuffer();   //  After reshape is called, this interface must be called to reallocate memory

Unlike the onnx runtime, pplnn can dynamically adjust the input size of the network even if the input size is fixed in the onnx model (but the input size must be reasonable).

Data obtained from preprocessing above in_ The data type is fp32, and the format is array (4-D data array is equivalent to nchw), which defines the format description of user input data:

TensorShape src_desc = input_tensor->GetShape();
    src_desc.SetDataType(DATATYPE_FLOAT32);
    SRC_ desc.SetDataFormat(DATAFORMAT_NDARRAY); //  For 4D data, array is equivalent to nchw

Finally, call the convertfromhost() interface to convert the data into_ Data to input_ The format required by tensor to complete data filling:

    status = input_tensor->ConvertFromHost(in_data, src_desc);

5. model reasoning

status = runtime->Run(); //  Perform network reasoning

6. obtain network output data

Get the output tensor of the runtime through the getoutputtensor() interface:

auto output_ tensor = runtime->GetOutputTensor(0); //  The classification network has only one output

Allocate data space to store network output:

    uint64_t output_size = output_tensor->GetShape().GetElementsExcludingPadding();
    std::vector<float> output_data_(output_size);
    float* output_data = output_data_.data();

Like input data, you need to define the desired output format description first:

TensorShape dst_desc = output_tensor->GetShape();
    dst_desc.SetDataType(DATATYPE_FLOAT32);
    dst_ desc.SetDataFormat(DATAFORMAT_NDARRAY); //  For 1D data, array is equivalent to vector

Call the converttohost() interface to convert the output_ Data conversion from tensor to DST_ Desc to get the output data:

    status = output_tensor->ConvertToHost(output_data, dst_desc);

7. parse output results

Parse the score output from the network to obtain the classification results:

int32_t GetClassificationResult(const float* scores, const int32_t size) {
    vector<pair<float, int>> pairs(size);
    for (int32_t i = 0; i < size; i++) {
        pairs[i] = make_pair(scores[i], i);
    }

    auto cmp_func = [](const pair<float, int>& p0, const pair<float, int>& p1) -> bool {
        return p0.first > p1.first;
    };

    const int32_t top_k = 5;
    nth_element(pairs.begin(), pairs.begin() + top_k, pairs.end(), cmp_func); // get top K results & sort
    sort(pairs.begin(), pairs.begin() + top_k, cmp_func);

    printf("top %d results:\n", top_k);
    for (int32_t i = 0; i < top_k; ++i) {
        printf("%dth: %-10f %-10d %s\n", i + 1, pairs[i].first, pairs[i].second, imagenet_labels_tab[pairs[i].second]);
    }

    return 0;
}

working

1. prepare onnx model

We have prepared a classification model mnasnet0 under tests/testdata_ 5.onnx, which can be used for testing.

More onnx models can be obtained through the following methods:

The model opset versions of onnx model zoo are all low. You can use convert under tools_ onnx_ opset_ version. Py convert opset to 11:

    python convert_onnx_opset_version.py --input_model input_model.onnx --output_model output_model.onnx --output_opset 11

For details of converting opset, please refer to:onnx-model-opset-convert-guide.md

2. prepare test pictures

Test pictures can be in any format. We prepared Cat0 under tests/testdata Png (big head photo of our cat owner) and cat1 Jpg (verification set image of Imagenet):

From 0 to 1, use openppl to implement an AI reasoning application

Images of any size can work normally. If you want to resize to 224 x 224, you can modify the following variables in the program:

const bool resize_ input = false; //  If you want to resize, change it to true

3. operation

    pplnn-build/samples/cpp/run_model/classification <image_file> <onnx_model_file>

After reasoning, the following output will be obtained:

image preprocess succeed!
[INFO][2021-07-23 17:29:31.341][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
successfully create runtime builder!
successfully build runtime!
successfully set input data to tensor [input]!
successfully run network!
successfully get outputs!
top 5 results:
1th: 3.416199   284        n02123597 Siamese cat, Siamese
2th: 3.049764   285        n02124075 Egyptian cat
3th: 2.989676   606        n03584829 iron, smoothing iron
4th: 2.812310   283        n02123394 Persian cat
5th: 2.796991   749        n04033901 quill, quill pen

It is not difficult to see that this program correctly judges that our cat owner is a real cat (>^ ω^<)

So far, the installation of openppl and the reasoning of image classification model have been completed

In addition, there is an executable file pplnn in the pplnn build / Tools Directory, which can perform arbitrary model reasoning, dump output data, benchmark and other operations.

The specific usage can be viewed with the –help option. You can make changes based on this example to become more familiar with the usage of openppl.

Communication QQ group: 627853444, group entry secret order openppl