By akarsh yelitsty
Source: to ward data science
Let’s see how to use detectron 2 of Fair (Facebook AI research) for instance detection on custom data sets involving text recognition.
Have you ever tried to train an object detection model from scratch using a custom dataset of your own choice?
If so, you will know how boring the process is. If we choose the method based on region recommendation, such as faster r-cnn, or we can also use SSD, Yolo and other one-time detector algorithms, we need to use feature pyramid network and region recommendation network to build the model.
Any one of them is a bit complicated if we want to implement it from scratch. We need a framework in which we can use the most advanced models, such as fast, fast and mask r-cnn. However, it is important that we build a model from scratch to understand the mathematical principles behind it.
If we want to use custom datasets to train object detection models quickly, detectron 2 can help. All models in the model library of detectron 2 library are pre trained on coco dataset. We just need to fine tune our custom dataset on the pre trained model.
Detectron 2 completely rewrites the first detectron released in 2018. Its predecessor was written on caffe2, a deep learning framework supported by Facebook. Caffe2 and detectron are not recommended at this time. Caffe2 is now part of pytorch, and its successor, detectron 2, was written entirely on pytorch.
Detectron2 aims to promote the development of machine learning by providing fast training and solving problems in the process of research and production.
The following are various types of target detection models provided by detectron 2.
Let’s study it directlyInstance detection。
Instance detection refers to the classification and location of objects with bounding box. In this paper, we will use the fast RCNN model in the model library of detectron 2 to recognize the text language in the image.
Please note that we limit the language to two.
We recognize Hindi and English texts and provide a class called “others” for other languages.
We will implement a model that outputs in this way.
Let’s get started!
With detectron 2, you can perform object detection on any custom dataset in seven steps. All of these steps can be easily found in Google colab notebook, and you can run them immediately!
It’s easy to do this with Google colab because we can train faster with GPU.
Step 1: install detectron 2
First install some dependencies, such as torch vision and coco API, and then check whether CUDA is available. CUDA helps track the currently selected GPU. Then install detectron2.
# install dependencies: !pip install -U torch==1.5 torchvision==0.6 -f https://download.pytorch.org/whl/cu101/torch_stable.html !pip install cython pyyaml==5.1 !pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' import torch, torchvision print(torch.__version__, torch.cuda.is_available()) !gcc --version # install detectron2: !pip install detectron2==0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
Step 2: prepare and register the dataset
Import some necessary packages.
# You may need to restart your runtime prior to this, to let your installation take effect import detectron2 from detectron2.utils.logger import setup_logger setup_logger() # import some common libraries import numpy as np import cv2 import random from google.colab.patches import cv2_imshow # import some common detectron2 utilities from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer from detectron2.data import MetadataCatalog
The built-in dataset lists the datasets that detectron2 has built-in support for. If you want to use a custom dataset and reuse the data loader of detectron2, you need to register the dataset (that is, tell detectron2 how to get the dataset).
- Built in dataset:https://detectron2.readthedocs.io/tutorials/builtin_datasets.html
We use a text detection dataset with three categories:
We will train the text detection model from the existing model pre trained on the coco dataset, which can be used in the model library of detectron2.
If you are interested in learning about the conversion from the original dataset format to the format accepted by detectron 2, please see:
How to input data into the model? The input data should belong to some formats, such as Yolo format, Pascal VOC format, coco format, etc. Detectron2 accepts data sets in coco format. The coco format of the dataset consists of a JSON file, which contains all the details of the image, such as the size, annotation (i.e. the coordinates of the bounding box), the label corresponding to its bounding box, etc. For example,
This is an image in JSON format. Bounding box representations have different types of formats. It must be a structures. Boxmode member of detectron2. There are five formats. But for now, it supports boxmode. Xyxy_ ABS, BoxMode.XYWH_ ABS.
We use the second format（ 10. Y) represents a coordinate of the bounding box, W and H represent the width and height of the box. category_ ID refers to the category to which the bounding box belongs.
Then we need to register our dataset.
import json from detectron2.structures import BoxMode def get_board_dicts(imgdir): json_file = imgdir+"/dataset.json" #Fetch the json file with open(json_file) as f: dataset_dicts = json.load(f) for i in dataset_dicts: filename = i["file_name"] i["file_name"] = imgdir+"/"+filename for j in i["annotations"]: j["bbox_mode"] = BoxMode.XYWH_ABS #Setting the required Box Mode j["category_id"] = int(j["category_id"]) return dataset_dicts from detectron2.data import DatasetCatalog, MetadataCatalog #Registering the Dataset for d in ["train", "val"]: DatasetCatalog.register("boardetect_" + d, lambda d=d: get_board_dicts("Text_Detection_Dataset_COCO_Format/" + d)) MetadataCatalog.get("boardetect_" + d).set(thing_classes=["HINDI","ENGLISH","OTHER"]) board_metadata = MetadataCatalog.get("boardetect_train")
In order to verify whether the data loading is correct, let’s visualize the annotation of randomly selected samples in the training set.
Step 3: visual training set
We’ll randomly select three images from the train folder of the dataset and see what the bounding box looks like.
#Visualizing the Train Dataset dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/train") #Randomly choosing 3 images from the Set for d in random.sample(dataset_dicts, 3): img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=board_metadata) vis = visualizer.draw_dataset_dict(d) cv2_imshow(vis.get_image()[:, :, ::-1])
The output looks like this,
Step 4: Training Model
We have taken a big step forward. This is the step we give to configure and set up the model for training. Technically, we just fine tune our model on the data set, because the model has been pre trained on the coco data set.
In the model library of detectron2, there are a lot of models that can be used for target detection. Here, we use fast_ rcnn_ R_ 50_ FPN_ 3x。
There is a backbone network (RESNET in this case) to extract features from the image, followed by a region recommendation network to make region recommendations, and a box head to tighten the bounding box.
You can read more about how r-cnn works faster in my previous article.
Let’s set the configuration for training.
from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg import os cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) #Get the basic model configuration from the model zoo #Passing the Train and Validation sets cfg.DATASETS.TRAIN = ("boardetect_train",) cfg.DATASETS.TEST = ("boardetect_val",) # Number of data loading threads cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo # Number of images per batch across all machines. cfg.SOLVER.IMS_PER_BATCH = 4 cfg.SOLVER.BASE_LR = 0.0125 # pick a good LearningRate cfg.SOLVER.MAX_ITER = 1500 #No. of iterations cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3 # No. of classes = [HINDI, ENGLISH, OTHER] cfg.TEST.EVAL_PERIOD = 500 # No. of iterations after which the Validation Set is evaluated. os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = CocoTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()
I don’t think it’s the best configuration. Of course, the accuracy of other configurations will also be improved. After all, it depends on choosing the right hyper parameter.
Note that we also calculate the accuracy of every 500 iterations in the validation set.
Step 5: use the trained model for reasoning
Now it’s time to infer the results by testing the model on the validation set.
After the training is completed successfully, the output folder is saved in the local memory, where the final weight is stored. You can save this folder to infer from this model in the future.
from detectron2.utils.visualizer import ColorMode #Use the final weights generated after successful training for inference cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.8 # set the testing threshold for this model #Pass the validation dataset cfg.DATASETS.TEST = ("boardetect_val", ) predictor = DefaultPredictor(cfg) dataset_dicts = get_board_dicts("Text_Detection_Dataset_COCO_Format/val") for d in random.sample(dataset_dicts, 3): im = cv2.imread(d["file_name"]) outputs = predictor(im) v = Visualizer(im[:, :, ::-1], metadata=board_metadata, scale=0.8, instance_mode=ColorMode.IMAGE ) v = v.draw_instance_predictions(outputs["instances"].to("cpu")) #Passing the predictions to CPU from the GPU cv2_imshow(v.get_image()[:, :, ::-1])
Step 6: evaluate the training model
Generally, the evaluation of the model follows the coco evaluation criteria. Mean accuracy (map) was used to evaluate the performance of the model.
#import the COCO Evaluator to use the COCO Metrics from detectron2.evaluation import COCOEvaluator, inference_on_dataset from detectron2.data import build_detection_test_loader #Call the COCO Evaluator function and pass the Validation Dataset evaluator = COCOEvaluator("boardetect_val", cfg, False, output_dir="/output/") val_loader = build_detection_test_loader(cfg, "boardetect_val") #Use the created predicted model in the previous step inference_on_dataset(predictor.model, val_loader, evaluator)
For 0.5 IOU, we get about 79.4% accuracy, which is not bad. This can be increased by slightly adjusting the parameters and increasing the number of iterations. But please pay close attention to the training process, because the model may over fit.
If you need to infer from the saved model, please visit:https://colab.research.google.com/drive/1d0kXs-TE7_3CXldJNs1WsEshXf8Gw_5n?usp=sharing
In this article, I focus on the process of target detection using the custom dataset of detectron 2, rather than focusing on achieving higher accuracy.
Although this seems to be a very simple process, there are still many things worth exploring in the detectron 2 library. We have a large number of optimization parameters that can be further adjusted to achieve higher accuracy, which depends entirely on a person’s custom data set.
You can download the notebook from my GitHub repository and try running it on Google colab or Jupiter notebooks.
I hope you’ve learned something new today.
Link to the original text:https://towardsdatascience.com/object-detection-in-6-steps-using-detectron2-705b92575578
Welcome to panchuang AI blog:
Sklearn machine learning official Chinese document:
Welcome to pancreato blog Resource Hub: