Using detectron2 to detect targets in 6 steps


By akarsh yelitsty
Compile Flin
Source: to ward data science

Let’s see how to use detectron 2 of Fair (Facebook AI research) for instance detection on custom data sets involving text recognition.

Have you ever tried to train an object detection model from scratch using a custom dataset of your own choice?

If so, you will know how boring the process is. If we choose the method based on region recommendation, such as faster r-cnn, or we can also use SSD, Yolo and other one-time detector algorithms, we need to use feature pyramid network and region recommendation network to build the model.

Any one of them is a bit complicated if we want to implement it from scratch. We need a framework in which we can use the most advanced models, such as fast, fast and mask r-cnn. However, it is important that we build a model from scratch to understand the mathematical principles behind it.

If we want to use custom datasets to train object detection models quickly, detectron 2 can help. All models in the model library of detectron 2 library are pre trained on coco dataset. We just need to fine tune our custom dataset on the pre trained model.

Detectron 2 completely rewrites the first detectron released in 2018. Its predecessor was written on caffe2, a deep learning framework supported by Facebook. Caffe2 and detectron are not recommended at this time. Caffe2 is now part of pytorch, and its successor, detectron 2, was written entirely on pytorch.

Detectron2 aims to promote the development of machine learning by providing fast training and solving problems in the process of research and production.

The following are various types of target detection models provided by detectron 2.

Let’s study it directlyInstance detection

Instance detection refers to the classification and location of objects with bounding box. In this paper, we will use the fast RCNN model in the model library of detectron 2 to recognize the text language in the image.

Please note that we limit the language to two.

We recognize Hindi and English texts and provide a class called “others” for other languages.

We will implement a model that outputs in this way.

Let’s get started!

With detectron 2, you can perform object detection on any custom dataset in seven steps. All of these steps can be easily found in Google colab notebook, and you can run them immediately!

It’s easy to do this with Google colab because we can train faster with GPU.

Step 1: install detectron 2

First install some dependencies, such as torch vision and coco API, and then check whether CUDA is available. CUDA helps track the currently selected GPU. Then install detectron2.

# install dependencies: 
!pip install -U torch==1.5 torchvision==0.6 -f
!pip install cython pyyaml==5.1
!pip install -U 'git+'
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version
# install detectron2:
!pip install detectron2==0.1.3 -f

Step 2: prepare and register the dataset

Import some necessary packages.

# You may need to restart your runtime prior to this, to let your installation take effect
import detectron2
from detectron2.utils.logger import setup_logger

# import some common libraries
import numpy as np
import cv2
import random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from import MetadataCatalog

The built-in dataset lists the datasets that detectron2 has built-in support for. If you want to use a custom dataset and reuse the data loader of detectron2, you need to register the dataset (that is, tell detectron2 how to get the dataset).

We use a text detection dataset with three categories:

  1. English

  2. Hindi

  3. other

We will train the text detection model from the existing model pre trained on the coco dataset, which can be used in the model library of detectron2.

If you are interested in learning about the conversion from the original dataset format to the format accepted by detectron 2, please see:

How to input data into the model? The input data should belong to some formats, such as Yolo format, Pascal VOC format, coco format, etc. Detectron2 accepts data sets in coco format. The coco format of the dataset consists of a JSON file, which contains all the details of the image, such as the size, annotation (i.e. the coordinates of the bounding box), the label corresponding to its bounding box, etc. For example,

This is an image in JSON format. Bounding box representations have different types of formats. It must be a structures. Boxmode member of detectron2. There are five formats. But for now, it supports boxmode. Xyxy_ ABS, BoxMode.XYWH_ ABS.

We use the second format( 10. Y) represents a coordinate of the bounding box, W and H represent the width and height of the box. category_ ID refers to the category to which the bounding box belongs.

Then we need to register our dataset.

import json
from detectron2.structures import BoxMode
def get_board_dicts(imgdir):
    json_file = imgdir+"/dataset.json" #Fetch the json file
    with open(json_file) as f:
        dataset_dicts = json.load(f)
    for i in dataset_dicts:
        filename = i["file_name"] 
        i["file_name"] = imgdir+"/"+filename 
        for j in i["annotations"]:
            j["bbox_mode"] = BoxMode.XYWH_ABS #Setting the required Box Mode
            j["category_id"] = int(j["category_id"])
    return dataset_dicts
from import DatasetCatalog, MetadataCatalog
#Registering the Dataset
for d in ["train", "val"]:
    DatasetCatalog.register("boardetect_" + d, lambda d=d: get_board_dicts("Text_Detection_Dataset_COCO_Format/" + d))
    MetadataCatalog.get("boardetect_" + d).set(thing_classes=["HINDI","ENGLISH","OTHER"])
board_metadata = MetadataCatalog.get("boardetect_train")

In order to verify whether the data loading is correct, let’s visualize the annotation of randomly selected samples in the training set.

Step 3: visual training set

We’ll randomly select three images from the train folder of the dataset and see what the bounding box looks like.

#Visualizing the Train Dataset
dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/train")
#Randomly choosing 3 images from the Set
for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=board_metadata)
    vis = visualizer.draw_dataset_dict(d)
    cv2_imshow(vis.get_image()[:, :, ::-1])

The output looks like this,

Step 4: Training Model

We have taken a big step forward. This is the step we give to configure and set up the model for training. Technically, we just fine tune our model on the data set, because the model has been pre trained on the coco data set.

In the model library of detectron2, there are a lot of models that can be used for target detection. Here, we use fast_ rcnn_ R_ 50_ FPN_ 3x。

There is a backbone network (RESNET in this case) to extract features from the image, followed by a region recommendation network to make region recommendations, and a box head to tighten the bounding box.

You can read more about how r-cnn works faster in my previous article.

Let’s set the configuration for training.

from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import os
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) #Get the basic model configuration from the model zoo 
#Passing the Train and Validation sets
cfg.DATASETS.TRAIN = ("boardetect_train",)
cfg.DATASETS.TEST = ("boardetect_val",)
# Number of data loading threads
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
# Number of images per batch across all machines.
cfg.SOLVER.BASE_LR = 0.0125  # pick a good LearningRate
cfg.SOLVER.MAX_ITER = 1500  #No. of iterations   
cfg.TEST.EVAL_PERIOD = 500 # No. of iterations after which the Validation Set is evaluated. 
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = CocoTrainer(cfg) 

I don’t think it’s the best configuration. Of course, the accuracy of other configurations will also be improved. After all, it depends on choosing the right hyper parameter.

Note that we also calculate the accuracy of every 500 iterations in the validation set.

Step 5: use the trained model for reasoning

Now it’s time to infer the results by testing the model on the validation set.

After the training is completed successfully, the output folder is saved in the local memory, where the final weight is stored. You can save this folder to infer from this model in the future.

from detectron2.utils.visualizer import ColorMode

#Use the final weights generated after successful training for inference  
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.8  # set the testing threshold for this model
#Pass the validation dataset
cfg.DATASETS.TEST = ("boardetect_val", )

predictor = DefaultPredictor(cfg)

dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/val")
for d in random.sample(dataset_dicts, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor(im)
    v = Visualizer(im[:, :, ::-1],
    v = v.draw_instance_predictions(outputs["instances"].to("cpu")) #Passing the predictions to CPU from the GPU
    cv2_imshow(v.get_image()[:, :, ::-1])


Step 6: evaluate the training model

Generally, the evaluation of the model follows the coco evaluation criteria. Mean accuracy (map) was used to evaluate the performance of the model.

This is an article about map:

#import the COCO Evaluator to use the COCO Metrics
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from import build_detection_test_loader

#Call the COCO Evaluator function and pass the Validation Dataset
evaluator = COCOEvaluator("boardetect_val", cfg, False, output_dir="/output/")
val_loader = build_detection_test_loader(cfg, "boardetect_val")

#Use the created predicted model in the previous step
inference_on_dataset(predictor.model, val_loader, evaluator)

For 0.5 IOU, we get about 79.4% accuracy, which is not bad. This can be increased by slightly adjusting the parameters and increasing the number of iterations. But please pay close attention to the training process, because the model may over fit.

If you need to infer from the saved model, please visit:


In this article, I focus on the process of target detection using the custom dataset of detectron 2, rather than focusing on achieving higher accuracy.

Although this seems to be a very simple process, there are still many things worth exploring in the detectron 2 library. We have a large number of optimization parameters that can be further adjusted to achieve higher accuracy, which depends entirely on a person’s custom data set.

You can download the notebook from my GitHub repository and try running it on Google colab or Jupiter notebooks.

I hope you’ve learned something new today.

Link to the original text:

Welcome to panchuang AI blog:

Sklearn machine learning official Chinese document:

Welcome to pancreato blog Resource Hub: