Tensorflow object detection – 1.0 and 2.0: training, derivation, optimization (tensorflow), inference (Jetson nano)

Time:2021-4-7

By Abhishek
Compile Flin
Source: analyticsvidhya

Part 1

Detailed steps from training detectors in custom datasets to reasoning on Jetson nanoplates or clouds using tensorflow 1.15

The complete code is available on GitHub

Some common difficulties include

  • Use the object detection API library to find compatible tensorflow (and related CUDA) versions

  • Convert custom data to TF record format

  • The processes of tf1.0 and tf2.0 are confused

  • Manually update the model configuration file for training

  • Run the training process and resolve problems in the configuration file

  • Export a model from one format to another for reasoning

  • Mixed with different model format types — checkpoint, frozen graph, saved_ Model (“. Pb”), tensorrt inference graph and so on

  • Running reasoning on training model

  • The trained model is converted into a quantitative format for deployment on a board such as Jetson nano

  • The tensorrt version and CUDA computing power do not match between the build engine and the deployment engine

The list is endless

To overcome the above problems, we add a low code Python wrapper to the two versions of tensorflow object detection API in monk object detection toolkit

With it, developers and researchers can easily

  • Using TF to push custom training data set

  • Use the python API to configure the model file of all parameters

  • Based on the availability of the network and CUDA version used, choose between tf1.0 and tf2.0

  • Train, export, optimize and infer according to your own data set

  • Use tensorrt to optimize the model and export it to the cloud server or Jetson nano and other embedded boards

Traditional process overview

The following is the process of training and deploying custom probes using TF. While describing the process flow, it also emphasizes the problems that a person faces when making everything work normally, and mentions the difference between tf1.0 and tf2.0 object detection libraries

Process a: compatibility between tensorflow and target detection device

  • To use object detection 2.0, use tensorflow 2.3.0. Versions 2.0.0 and 2.1.0 usually result in “tensorflow”_ core.keras.utils ”. version 2.2.0 will cause errors when using the “collective all reduce extended” module for training.

  • When using tensorflow 2.3.0, CUDA 10.1 is required.

  • To use object detection 1.0, use tensorflow version 1.15.0 or 1.15.2.

  • When tensorflow 1.15 is used, CUDA 10.0 is required.

  • There are still some errors in tflite conversion (to be discussed in a later blog)

Procedure B: setting up a dataset

  • Tensorflow provides data set tools to convert data to an acceptable TF record format

  • However, these examples are only applicable to the most commonly used data sets, such as coco, Pascal, VOC, openimages, pets dataset, etc. According to the selected sample notebook, users need to reformat and arrange the data set according to coco, VOC, oid and other formats

  • Another way is to update the sample code to extract the custom dataset, which is a difficult process in itself

  • To make it easy to load custom datasets, we modify the example and add a further parser to support multiple data annotation types, and convert them directly to TF records.

Process C: update the configuration and start the training process

  • Monk’s object detection API 1.0 wrapper supports about 23 models, and object detection API 2.0 supports about 26 models

  • Once the model is selected and the weights are downloaded, the configuration file must be updated manually.

  • API 1.0 and 2.0 have different configuration file formats and need to be changed manually in slightly different ways

  • Some configurations in tf1.0 have the problem of basic feature extraction parameters.

  • After applying the update to the configuration file, the entire workspace must be arranged as specified in the tutorial on the TF obj GitHub site.

  • After rescheduling, you can start training. Similarly, the training for TF 1.0 and TF 2.0 models is different.

  • Through “monk object detection”, we added the python function to update the configuration file, and no longer need to use strict folder structure for the workspace. The training process of both TF versions is almost the same as that of monk.

Process D: derivation of trained models for reasoning

  • Both object detection APIs provide well-trained models in checkpoint (“. CKPT”) format.

  • In order to infer in TF 1.0, frozen graphics format is usually used.

  • For reasoning in TF 2.0, the saved model format is usually used.

  • Especially for beginners, the process of transforming models is different in both APIs, which is usually difficult to understand

  • To simplify the process, we added a parser to keep the external wrapper format the same, which means we can use both TF 1.0 API and TF 2.0 API.

Process E: model optimization of tensorrt inference

  • The exported model is finally converted to the optimized version by tensorrt.

  • Supported optimizations include floating-point 32-bit and 16 bit (fp32, fp16) and integer 8-bit (int8) quantization.

  • The quantization process of the model derived from tf1.0 and tf2.0 is completely different.

  • There are other issues with the version of tensorrt. This means that models optimized with tensorrt version 5.1.5 cannot run on deployment computers that use tensorrt version 5.1.6. A very specific problem is object detection 1.0 using tensorflow 1.15.0. Tensorflow comes with tensorrt 5.1.5, which is not available in jetpacks.

  • Another problem with tensorrt is CUDA computing. It means that unless appropriate measures are taken, the optimized model on the GPU with 7.0 computing power (V100 NVIDIA GPU) cannot run on the GPU with 5.3 computing power (Jetson nanoplate).

  • This blog clarifies all questions by training and optimizing the object detection model

Process f: set everything on the Jetson nano board

  • Because the two APIs need different tensorflow versions, the installation process is different. Jetpack version, CUDA version and TF 1.0 need to pay more attention to tensorflow version.

Let’s start with version 1.0, one object detection API module at a time.

TF object detection API 1.0

Process a: installing on the development machine

Libraries to be installed

  • Prerequisites: numpy, SciPy, panda, pilot, opencv Python

  • Tensorflow GPU v1.15.0 with tensorfrt 5.1.5 is not required if it is deployed on the nano board

  • Tensorflow GPU v1.15.2 with tensorrt 6.0.1; if it is deployed on the nano board, you need to

  • TF object detection API 1.0 using monk object detection Toolkit

(make sure that the NVIDIA driver is installed with CUDA 10.0 and cudnn 7)

When the model is to be deployed on the Jetson nano board, please follow the instructions below to configure your development (training) machine

Install the required Python Library

$ git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

$ cd Monk_Object_Detection/12_tf_obj_1/installation

$ chmod +x install_cuda10_tensorrt6_part1.sh && ./install_cuda10_tensorrt6_part1.sh

Install tensorrt 6.0.1

# Go to https://developer.nvidia.com/tensorrt
# Download 
# - nv-tensorrt-repo-ubuntu1804-cuda10.0-trt6.0.1.5-ga-20190913_1-1_amd64.deb (For Ubuntu18.04)
# - nv-tensorrt-repo-ubuntu1604-cuda10.0-trt6.0.1.5-ga-20190913_1-1_amd64.deb (For Ubuntu16.04)

# Run the following commands to install trt (in a terminal)

$ sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt6.0.1.5-ga-20190913_1-1_amd64.deb
$ sudo apt-key add 
$ sudo apt-get update
$ sudo apt-get install tensorrt
$ sudo apt-get install uff-converter-tf
$ sudo apt-get install python3-libnvinfer-dev

Install bazel 0.26.1 and clone tensorflow from GitHub

# Install bazel version 0.26.1
# Download bazel deb package from https://github.com/bazelbuild/bazel/releases/tag/0.26.1

$ sudo dpkg -i bazel_0.26.1-linux-x86_64.deb

# Clone Tensorflow and switch to tensorflow 1.15.2

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout v1.15.2

Configure tensorflow

# Configure tensorflow

$ ./configure

    - Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y

    - Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N

    - Do you wish to build TensorFlow with ROCm support? [y/N]: N

    - Do you wish to build TensorFlow with CUDA support? [y/N]: Y      

    - Do you wish to build TensorFlow with TensorRT support? [y/N]: Y

    - And press enter (set default) for all other config questions asked by the setup

Build and install tensorflow (about 5 hours on AWS p3.2x instance)

# Build tensorflow using bazel

$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package


# Once built create a wheel file for python installation and run pip installer

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg

$ cd tensorflow_pkg && pip install tensorflow*.whl

Finally, build object detection API 1.0

# Compile Object Detection API v1

$ cd Monk_Object_Detection/12_tf_obj_1/installation

$ chmod +x install_cuda10_tensorrt6_part2.sh && ./install_cuda10_tensorrt6_part2.sh

When you are not planning to deploy the model on the Jetson nano board, please follow the instructions below to configure your development (training) machine

Install all required libraries and compile object detection API 1.0

$ git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

$ cd Monk_Object_Detection/12_tf_obj_1/installation

$ chmod +x install_cuda10.sh && ./install_cuda10.sh

Install tensorrt 5.1.5 as pre built tensorflow 1.15.0 support

# Go to https://developer.nvidia.com/tensorrt
# Download 
# - nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.5.0-ga-20190427_1-1_amd64.deb (For Ubuntu18.04)
# - nv-tensorrt-repo-ubuntu1604-cuda10.0-trt5.1.5.0-ga-20190427_1-1_amd64.deb(For Ubuntu16.04)

# Run the following commands to install trt (in a terminal)

$ sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt5.1.5.0-ga-20190427_1-1_amd64.deb
$ sudo apt-key add 
$ sudo apt-get update
$ sudo apt-get install tensorrt
$ sudo apt-get install uff-converter-tf
$ sudo apt-get install python3-libnvinfer-dev

When using Google colab, please follow the instructions below (tensorrt may not work properly on colab)

# Switch to TF 1.0 version (Run the following line)
$ %tensorflow_version 1.x
# Now reset the runetime if prompted by colab

# Run the following commands
$ git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git
$ cd Monk_Object_Detection/12_tf_obj_1/installation
$ chmod +x install_colab.sh && ./install_colab.sh

Process B: building data sets

The monk object detection parser requires that the data set be in coco or Pascal VOC format. For this tutorial, let’s stick to the Pascal VOC format

To convert a dataset from any format to Pascal VOC, see the following detailed tutorial

In this case, the ship detection data set is taken from an old blog about object detection using retinanet

The steps to use this data are mentioned in this jupyter notebook

Process C: update the configuration and start the training process

Load training engine

from train_detector import Detector

gtf = Detector();

Load all available models in TF 1.15 model library

Currently, it supports 24 different models of SSDs and fast RCNN

Load training validation dataset

Load the dataset after converting the annotation to VOC format

Set the batch size based on the available GPUs. In this tutorial, the AWS ec2p3.2x computer with v100gpu (16gbvram) is used, and the batch size of 24 is very suitable.

train_img_dir = "ship/images/Train";
train_anno_dir = "ship/voc/";
class_list_file = "ship/classes.txt";

gtf.set_train_dataset(train_img_dir, train_anno_dir, class_list_file, batch_size=24)

Run the parser to convert the dataset to tfrecords

The TF record file will be stored in data_ Tfrecord folder

gtf.create_tfrecord(data_output_dir="data_tfrecord")

Select and load the model

After downloading the model, monk will automatically update the configuration file based on the selected parameters

In this tutorial, we use SSD mobilenet V1, which can receive input images in the shape of 320x320x3 RGB images

gtf.set_model_params(model_name="ssd_mobilenet_v1")

Set other training and optimizer parameters

set_hyper_params(num_train_steps=10000,
lr=0.004,
lr_decay_rate=0.945,
output_dir="output_dir/",
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=5,
checkpoint_dir=False,
run_once=False,
max_eval_retries=0,
num_workers=4,
checkpoint_after_every=500)

Set the directory where export parameters are stored

gtf.export_params(output_directory="export_dir");

Setting tensorrt optimization parameters

The tensorrt optimizer creates a plan and builds it. The build plan is to optimize the model of the GPU it is building.

As mentioned earlier, optimized models on GPUs with different CUDA computing capabilities cannot run on Jetson nano, so the monk library ensures that the plan is compiled on the development machine (cloud or colab), while the plan is built on the deployment machine (Jetson nano) at run time

When using int8 optimization, this function cannot be performed. The planning and construction must be on the same machine, and Jetson nanoplates are not compatible with 8-bit integer operation

gtf.TensorRT_Optimization_Params(conversion_type="FP16", trt_dir="trt_fp16_dir")

Training detector

The detector is trained to run an execution sys.exit () function, so the wrapper running on it will shut down the python system.

To solve this problem, a train.py The script can be run on jupyter notebook or terminal command

According to the parameter setting, the trained model will be saved in a file named “output”_ “Dir” folder.

# Run in a terminal
$ python Monk_Object_Detection/12_tf_obj_1/lib/train.py

# or run this command on a jupyter notebook
%run Monk_Object_Detection/12_tf_obj_1/lib/train.py

Process D: derivation of trained models for reasoning

Export a trained checkpoint model

The export function runs an execution sys.exit () function, so the wrapper running on it will shut down the python system.

To solve this problem, a export.py The script can be run on jupyter notebook or terminal command

According to the parameter settings, the exported model will be saved in a file named “export”_ “Dir” folder.

# Run in a terminal
$ python Monk_Object_Detection/12_tf_obj_1/lib/export.py

# or run this command on a jupyter notebook
%run Monk_Object_Detection/12_tf_obj_1/lib/export.py

Process E: model optimization of tensorrt inference

Optimize export model

The optimization function runs an execution sys.exit () function, so the wrapper running on it will shut down the python system.

To solve this problem, a optimize.py The script can be run on jupyter notebook computer or terminal command

According to the parameter setting, the optimized model will be saved in the file named “TRT”_ fp16_ “Dir” folder.

# Run in a terminal
$ python Monk_Object_Detection/12_tf_obj_1/lib/optimize.py

# or run this command on a jupyter notebook
%run Monk_Object_Detection/12_tf_obj_1/lib/optimize.py

Process F-1: running reasoning on a development machine

Load inference engine

from infer_detector import Infer

gtf = Infer();

Load model

First load the exported model and run the steps, then repeat the same steps by loading the optimized model (the steps remain unchanged)

# To load exported model
gtf.set_model_params('export_dir/frozen_inference_graph.pb', "ship/classes.txt")

# To load optimized model
gtf.set_model_params('trt_fp16_dir/trt_graph.pb', "ship/classes.txt")

Infer from a single image

scores, bboxes, labels = gtf.infer_on_image('ship/test/img1.jpg', thresh=0.1);

Running speed test analysis using two models

gtf.benchmark_for_speed('ship/test/img1.jpg')

The analysis is performed on the AWS p3.2x V100 GPU using the derived model (not optimized)

Average Image loading time : 0.0091 sec
Average Inference time     : 0.0103 sec
Result extraction time     : 0.0801 sec
total_repetitions          : 100
total_time                 : 1.0321 sec
images_per_sec             : 96
latency_mean               : 10.3218 ms
latency_median             : 10.3234 ms
latency_min                : 9.4773 ms

Using optimization model to analyze on AWS p3.2x V100 GPU

After optimization, the speed is increased about 2.5 times

Average Image loading time : 0.0092 sec
Average Inference time     : 0.0042 sec
Result extraction time     : 0.0807 sec
total_repetitions          : 100
total_time                 : 0.4241 sec
images_per_sec             : 235
latency_mean               : 4.2412 ms
latency_median             : 4.2438 ms
latency_min                : 4.0156 ms

Process F-3: installation steps on Jetson nano board

Step 1: update apt

$ sudo apt-get update
$ sudo apt-get upgrade

Step 2: install the system library

$ sudo apt-get install nano git cmake libatlas-base-dev gfortran libhdf5-serial-dev hdf5-tools nano locate libfreetype6-dev python3-setuptools protobuf-compiler libprotobuf-dev openssl libssl-dev libcurl4-openssl-dev cython3 libxml2-dev libxslt1-dev python3-pip

$ sudo apt-get install libopenblas-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev

$ sudo pip3 install virtualenv virtualenvwrapper

Step 3: update the bashrc file

Add these lines to the ~ /. Bashrc file

export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/bin/virtualenv
source /usr/local/bin/virtualenvwrapper.sh

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Run the following command

$ source ~/.bashrc

Step 4: create a virtual environment and install all the necessary Python libraries. It takes about 15 minutes to install numpy

$ mkvirtualenv -p /usr/bin/python3.6 tf2

$ pip install numpy==1.19.1

It takes about 40 minutes to install SciPy

$ pip install scipy==1.5.1

Install Jetson nano tensorflow-1.15. Take another 15 minutes

$ pip install scikit-build protobuf cython -vvvv

$ pip install grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta -vvvv

$ pip install https://developer.download.nvidia.com/compute/redist/jp/v43/tensorflow-gpu/tensorflow_gpu-1.15.0+nv19.12-cp36-cp36m-linux_aarch64.whl -vvvv

It takes 1.5 hours to install OpenCV

$ mkdir opencv && cd opencv
$ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.1.2.zip
$ unzip opencv.zip
$ mv opencv-4.1.2 opencv
$ cd opencv && mkdir build && cd build

$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D WITH_CUDA=OFF -D WITH_CUBLAS=OFF -D WITH_LIBV4L=ON -D BUILD_opencv_python3=ON -D BUILD_opencv_python2=OFF -D BUILD_opencv_java=OFF -D WITH_GSTREAMER=ON -D WITH_GTK=ON -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D OPENCV_ENABLE_NONFREE=OFF ..

$ make -j3
$ sudo make install

$ cd ~/.virtualenvs/tf2/lib/python3.6/site-packages
$ ln -s /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.cpython-36m-aarch64-linux-gnu.so cv2.so

Finally, the monk object detection library is cloned and TF object detection API is installed

$ git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

$ cd Monk_Object_Detection/12_tf_obj_1/installation/

$ chmod +x install_nano.sh && ./install_nano.sh

Process F-4: inference about Jetson nano

Copy / download the optimized weight folder to Jetson nano working directory (clone monk Library)

From monk_ Object_ Detection library copy sample image

$ cp -r Monk_Object_Detection/example_notebooks/sample_dataset/ship .

Load reasoning engine and model (this step takes about 4 to 5 minutes)

from infer_detector import Infer

gtf = Infer();

gtf.set_model_params('trt_fp16_dir/trt_graph.pb', "ship/classes.txt")

Now, as mentioned earlier, tensorrt is responsible for planning and building (optimizing) the plan at run time, so the first run takes about 3-4 minutes

scores, bboxes, labels = gtf.infer_on_image('ship/test/img5.jpg', thresh=0.5, img_size=300);

The highlighted area shows Jetson nano’s tesnorrt build (optimization) plan (model) (image owned by the author)

It won’t take long to run it again.

Benchmark board benchmark analysis

gtf.benchmark_for_speed('ship/test/img1.jpg')
# With Jetson Nano power mode - 5W ModeAverage Image loading time : 0.0275 sec
Average Inference time     : 0.0621 sec
total_repetitions          : 100
total_time                 : 6.2172sec
images_per_sec             : 16
latency_mean               : 67.1722 ms
latency_median             : 60.7875 ms
latency_min                : 57.4391 ms
# With Jetson Nano power mode - MAXN ModeAverage Image loading time : 0.0173 sec
Average Inference time     : 0.0426 sec
total_repetitions          : 100
total_time                 : 4.2624 sec
images_per_sec             : 23
latency_mean               : 42.6243 ms
latency_median             : 41.9758 ms
latency_min                : 40.9001 ms

Jupyter notebook provides complete code for tensorflow object detection API 1.0

Download all pre trained weights from Google drive

Part 2

Detailed steps from training detectors on custom datasets to reasoning using tensorflow 2.3 on Jetson nanoplates or clouds

TF object detection API 2.0

Process a: installing on the development machine

Libraries to install

Prerequisite: numpy, SciPy, panda, panda, pilot, opencv python

Tensorflow GPU v2.3.0 with tensorrt 6.0.1

Using TF object detection API 2.0 of monk object detection Toolkit

Tensorrt will be installed

The following part (ensure that the NVIDIA driver is installed with CUDA 10.0 and cudnn 7)

Run the following steps in the development (training) machine

$ git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

#For Cuda 10 systems
$ cd Monk_Object_Detection/13_tf_obj_1/installation && chmod +x install_cuda10.sh && ./install_cuda10.sh

#For Google colab
$ cd Monk_Object_Detection/13_tf_obj_1/installation && chmod +x install_colab.sh && ./install_colab.sh

Process B: building data sets

This is the same as in part 1. The monk object detection parser requires that the data set be in coco or Pascal VOC format. For this tutorial, let’s stick to the Pascal VOC format

To convert your dataset from any format to Pascal VOC, see the detailed tutorial below

In this example, the ship detection dataset is obtained from an old blog about object detection

The steps to use data are mentioned in this jupyter notebook

Process C: update the configuration and start the training process

Load training engine

from train_detector import Detector

gtf = Detector();

Load all available models in TF 2.0 model Zoo

Currently, it supports 26 SSD, fast RCNN and efficientdet models

The support for centernet model will be added soon. There is an error in the training of the original pipeline

Loading training and validation data sets

Load the dataset after converting the annotation to VOC format

Set the batch size based on the available GPUs. In this tutorial, the AWS EC2 p3.2x computer with V100 GPU (16 GB VRAM) is used, and the batch size of 24 is very suitable.

train_img_dir = "ship/images/Train";
train_anno_dir = "ship/voc/";
class_list_file = "ship/classes.txt";

gtf.set_train_dataset(train_img_dir, train_anno_dir, class_list_file, batch_size=24)

Run the parser to convert the dataset to tfrecords

The TF record file will be stored in data_ Tfrecord folder

gtf.create_tfrecord(data_output_dir="data_tfrecord")

Select and load the model

After downloading the model, monk will automatically update the configuration file based on the selected parameters

In this tutorial, we used SSD mobilenet V2, which can receive input images in the shape of 320x320x3 RGB images

gtf.set_model_params(model_name="ssd_mobilenet_v2_320")

Set other training and optimizer parameters

set_hyper_params(num_train_steps=10000,
lr=0.004,
lr_decay_rate=0.945,
output_dir="output_dir/",
sample_1_of_n_eval_examples=1,
sample_1_of_n_eval_on_train_examples=5,
checkpoint_dir=False,
run_once=False,
max_eval_retries=0,
num_workers=4,
checkpoint_after_every=500)

Set the directory to store the exported parameters

gtf.export_params(output_directory="export_dir");

Setting tensorrt optimization parameters

The tensorrt optimizer creates a plan and builds it. The build plan is to optimize the model of the GPU it is building.

As mentioned earlier, optimized models on GPUs with different CUDA computing capabilities cannot run on Jetson nano, so the monk library ensures that the plan is compiled on the development machine (cloud or colab), while the plan is built on the deployment machine (Jetson nano) at run time

When using int8 optimization, this function cannot be performed. The planning and construction must be on the same machine, and Jetson nanoplates are not compatible with 8-bit integer operation

gtf.TensorRT_Optimization_Params(conversion_type="FP16", trt_dir="trt_fp16_dir")

Training detector

The detector is trained to run an execution sys.exit () function, so the wrapper running on it will shut down the python system.

To solve this problem, a train.py The script can be run on jupyter notebook or terminal command

According to the parameter setting, the trained model will be saved in a file named “output”_ “Dir” folder.

# For terminal users
$ python Monk_Object_Detection/13_tf_obj_2/lib/train.py

# For jupyter notebook or colab users
%run Monk_Object_Detection/13_tf_obj_2/lib/train.py

Process D: derivation of trained models for reasoning

Export a trained checkpoint model

The export function runs an execution sys.exit () function, so the wrapper running on it will shut down the python system.

To solve this problem, a export.py The script can be run on jupyter notebook or terminal command

According to the parameter settings, the exported model will be saved in a file named “export”_ “Dir” folder.

# For terminal users
$ python Monk_Object_Detection/13_tf_obj_2/lib/export.py

# For jupyter notebook and colab users
%run Monk_Object_Detection/13_tf_obj_2/lib/export.py

Process E: model optimization of tensorrt inference

Install tensorrt version 6.0.1

Go to NVIDIA tensorrt page and download trt6 package based on OS and CUDA.

Here are the steps for Ubuntu OS and CUDA 10.1

# Optimizing For TensorRT - Feature Not tested on colab
# This requires TensorRT 6.0.1 to be installed
# Go to https://developer.nvidia.com/tensorrt

# Download 
# - nv-tensorrt-repo-ubuntu1804-cuda10.1-trt6.0.1.5-ga-20190913_1-1_amd64.deb (For Ubuntu18.04)
# - nv-tensorrt-repo-ubuntu1604-cuda10.1-trt6.0.1.5-ga-20190913_1-1_amd64.deb (For Ubuntu16.04)
# Run the following commands to install trt (in a terminal)

$ sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.1-trt6.0.1.5-ga-20190913_1-1_amd64.deb
$ sudo apt-key add /var/nv-tensorrt-repo-cuda10.1-trt6.0.1.5-ga-20190913/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install tensorrt
$ sudo apt-get install uff-converter-tf
$ sudo apt-get install python3-libnvinfer-dev

Optimize export model

The optimization function runs an execution sys.exit () function, so the wrapper running on it will shut down the python system.

To solve this problem, a optimize.py The script can be run on jupyter notebook computer or terminal command

According to the parameter setting, the optimized model will be saved in the file named “TRT”_ fp16_ “Dir” folder.

# For terminal users
$ python Monk_Object_Detection/13_tf_obj_2/lib/optimize.py

# For jupyter notebook and colab users
%run Monk_Object_Detection/13_tf_obj_2/lib/optimize.py

Process F-1: running reasoning on a development machine

Load inference engine

from infer_detector import Infer

gtf = Infer();

Load model

First, load the exported model and run the steps; later, repeat the same steps by loading the optimized model (the steps remain unchanged)

# To load exported model
gtf.set_model_params(exported_model_dir = 'export_dir')

# To load optimized model
gtf.set_model_params(exported_model_dir = 'trt_fp16_dir')

Infer from a single image

scores, bboxes, labels = gtf.infer_on_image('ship/test/img1.jpg', thresh=0.1);

Sample inference results

Running speed test analysis using two models

gtf.benchmark_for_speed('ship/test/img1.jpg')

The analysis is performed on the AWS p3.2x V100 GPU using the derived model (not optimized)

Average Image loading time : 0.0110 sec
Average Inference time     : 0.0097 sec
Result extraction time     : 0.0352 sec
total_repetitions          : 100
total_time                 : 0.9794 sec
images_per_sec             : 102
latency_mean               : 9.7949 ms
latency_median             : 9.7095 ms
latency_min                : 9.1238 ms

Using optimization model to analyze on AWS p3.2x V100 GPU

About 1.5 times the speed to speed up the post-processing optimization

Average Image loading time : 0.0108 sec
Average Inference time     : 0.0062 sec
Result extraction time     : 0.0350 sec
total_repetitions          : 100
total_time                 : 0.6241 sec
images_per_sec             : 160
latency_mean               : 6.2422 ms
latency_median             : 6.2302 ms
latency_min                : 5.9401 ms

Process F-2: set everything on the Jetson nano board

Step 1: download the jetpack 4.3 SD card imagehttps://developer.nvidia.com/jetpack-43-archive

Step 2: write this picture to SD card. You can use ithttps://www.balena.io/etcher/

Step 3: insert your SD card into the nano board and start the system, then complete the installation steps

Get more details about NVIDIA’s “getting started with Jetson nano” page

Process F-3: installation steps on Jetson nano board

Step 1: update apt

$ sudo apt-get update
$ sudo apt-get upgrade

Step 2: install the system library

$ sudo apt-get install nano git cmake libatlas-base-dev gfortran libhdf5-serial-dev hdf5-tools nano locate libfreetype6-dev python3-setuptools protobuf-compiler libprotobuf-dev openssl libssl-dev libcurl4-openssl-dev cython3 libxml2-dev libxslt1-dev python3-pip

$ sudo apt-get install libopenblas-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev

$ sudo pip3 install virtualenv virtualenvwrapper

Step 3: update the bashrc file

Add these lines to the ~ /. Bashrc file

export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_VIRTUALENV=/usr/local/bin/virtualenv
source /usr/local/bin/virtualenvwrapper.sh

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Run the following command

$ source ~/.bashrc

Step 4: create a virtual environment and install all the necessary Python libraries

It takes about 15 minutes to install numpy

$ mkvirtualenv -p /usr/bin/python3.6 tf2

$ pip install numpy==1.19.1

It takes about 40 minutes to install SciPy

$ pip install scipy==1.5.1

It will take another 15 minutes to install Jetson nano tensorflow-2.0.0

$ pip install scikit-build protobuf cython -vvvv

$ pip install grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta -vvvv

$ pip install https://developer.download.nvidia.com/compute/redist/jp/v43/tensorflow-gpu/tensorflow_gpu-2.0.0+nv19.12-cp36-cp36m-linux_aarch64.whl -vvvv

It takes 1.5 hours to install OpenCV

$ mkdir opencv && cd opencv
$ wget -O opencv.zip https://github.com/opencv/opencv/archive/4.1.2.zip
$ unzip opencv.zip
$ mv opencv-4.1.2 opencv
$ cd opencv && mkdir build && cd build

$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D WITH_CUDA=OFF -D WITH_CUBLAS=OFF -D WITH_LIBV4L=ON -D BUILD_opencv_python3=ON -D BUILD_opencv_python2=OFF -D BUILD_opencv_java=OFF -D WITH_GSTREAMER=ON -D WITH_GTK=ON -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D OPENCV_ENABLE_NONFREE=OFF ..

$ make -j3
$ sudo make install

$ cd ~/.virtualenvs/tf2/lib/python3.6/site-packages
$ ln -s /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.cpython-36m-aarch64-linux-gnu.so cv2.so

Finally, clone the monk object detection library

Note: don’t run 13 as you would in a development machine_ tf_ obj_ 2. There are some problems in installing TF object detection with tf2.0. Reasoning code does not need object detection API tools.

$ git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git

Process F-4: inference about Jetson nano

Copy / download the optimized weight folder to Jetson nano working directory (monk library is clone directory)

From monk_ Object_ Detection library copy sample image

$ cp -r Monk_Object_Detection/example_notebooks/sample_dataset/ship .

Load reasoning engine and model (this step takes about 4 to 5 minutes)

from infer_detector_nano import Infer
gtf = Infer();

gtf.set_model_params(exported_model_dir = 'trt_fp16_dir')

Now, as mentioned earlier, tensorrt takes the plan and builds (optimizes) it at run time, so the first run takes about 3-4 minutes

scores, bboxes, labels = gtf.infer_on_image('ship/test/img1.jpg', thresh=0.1);

# Oputput will be saved as output.jpg
gtf.draw_on_image(self, bbox_thickness=3, text_size=1, text_thickness=2)

The highlighted area shows Jetson nano’s tesnorrt build (optimization) plan (model) (image owned by the author)

It won’t take long to run it again.

Benchmark board benchmark analysis

gtf.benchmark_for_speed('ship/test/img1.jpg')
# With Jetson Nano power mode - 5W ModeAverage Image loading time : 0.0486 sec
Average Inference time     : 0.1182 sec
total_repetitions          : 100
total_time                 : 11.8244 sec
images_per_sec             : 8
latency_mean               : 118.2443 ms
latency_median             : 117.8019 ms
latency_min                : 111.0002 ms
# With Jetson Nano power mode - MAXN ModeAverage Image loading time : 0.0319 sec
Average Inference time     : 0.0785 sec
total_repetitions          : 100
total_time                 : 7.853 sec
images_per_sec             : 12
latency_mean               : 78.5399 ms
latency_median             : 78.1973 ms
latency_min                : 76.2658 ms

Jupyter notebook provides complete code for tensorflow object detection API 2.0

Download all pre trained weights from Google drive

This is the end of all the work of tensorflow object detection API v 2.0

Thank you for reading! Have a good time!!

Link to the original text:https://www.analyticsvidhya.com/blog/2020/09/tensorflow-object-detection-1-0-2-0-train-export-optimize-tensorrt-infer-jetson-nano/

Welcome to panchuang AI blog:
http://panchuang.net/

Sklearn machine learning official Chinese document:
http://sklearn123.com/

Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/