By prateek Joshi
Compile | Flin
- Are you excited about the idea of smart city? If so, you’ll love this tutorial on building your own vehicle detection system
- Before we go into the implementation part, we will first learn how to detect moving objects in the video
- We will use opencv and python to build an automatic vehicle detector
I like the idea of smart city. Automatic intelligent energy system, power grid, the idea of one button access port and so on. This is a fascinating concept! To be honest, it’s a data scientist’s dream, and I’m glad that many cities around the world are moving towards smarter things.
One of the core components of intelligent city is automatic traffic management. This makes me think – can I use my scientific knowledge of data to build a vehicle detection model and play a role in intelligent transportation management?
Think about it. If you can integrate a vehicle detection system into a traffic light camera, you can easily track many useful things at the same time:
- How many cars are there at the intersection during the day?
- When is the traffic jam?
- What kind of intersection are heavy vehicles passing through?
- Is there any way to optimize traffic and distribute it through different streets?
There are many other examples that will not be listed one by one. Applications are endless!
We humans can easily detect and recognize objects from complex scenes in a flash. However, it is necessary for us to learn to use computer vision algorithm to detect objects.
Therefore, in this paper, we will establish an automatic vehicle detector and counter model. Here’s what you can look forward to:
Note: do not understand the new concepts of deep learning and computer vision? Here are two popular courses to start your deep learning journey:
- Foundation of deep learning（ https://courses.analyticsvidh… )
- Computer vision using deep learning（ https://courses.analyticsvidh… )
- The idea of moving object detection in video
- Real world use cases of object detection in video
Basic concepts of video target detection
- Frame difference
- Image threshold
- Detection profile
- Image expansion
- Construction of vehicle detection system based on OpenCV
The idea of moving object detection in video
Object detection is an attractive field in computer vision. When we deal with video data, it reaches a whole new level of complexity, but it pays off!
We can use target detection algorithms to perform super useful high-value tasks, such as surveillance, traffic management, fighting crime and so on. The following GIF diagram illustrates this idea:
In target detection, we can perform many subtasks, such as calculating the number of targets, finding the relative size of targets or finding the relative distance between targets. They help solve some of the most difficult tasks.
If you want to learn target detection from scratch, I suggest you use the following tutorial:
- The basic target detection algorithm is introduced step by step（ https://www.analyticsvidhya.c… )
- Real time target detection using slimyolov3（ https://www.analyticsvidhya.c… )
- Other target detection items and resources（ https://www.analyticsvidhya.c… )
Let’s look at some exciting real-world target detection use cases.
Real world use cases of object detection in video
Nowadays, video object detection is widely used in various industries. Use cases range from video surveillance to sports broadcasting to robot navigation.
The good news is that in future video object detection and tracking use cases, the possibilities are endless. Here I list some interesting applications:
- Crowd count（ https://www.analyticsvidhya.c… )
- License plate detection and recognition
- Ball tracking in motion（ https://www.analyticsvidhya.c… )
- Traffic management (we’ll see this idea in this article)
Basic concepts of video target detection
Before you start building a video detection system, you should know some key concepts. Once you are familiar with these basic concepts, you can build your own inspection system for any use case you choose.
So, how do you want to detect moving objects in the video?
Our goal is to capture the coordinates of a moving object and highlight it in the video. Consider this frame in the following video:
We hope our model can detect moving objects in the video, as shown in the figure above. A moving car is detected and a bounding box is created around the car.
There are many ways to solve this problem. You can train a deep learning model for target detection, or you can choose a pre trained model and fine tune it according to your data. However, these methods are supervised learning methods, which need labeled data to train the target detection model.
In this article, we willThis paper focuses on unsupervised target detection in video, that is, target detection without any label data。 We will useFrame difference technology。 Let’s see how it works!
A video is a set of frames stacked together in the correct order. So, when we see an object moving in the video, it means that the object is in a different position on each successive frame.
If we assume that no object moves in a pair of consecutive frames other than the target, then the pixel difference between the first frame and the second frame highlights the pixel of the moving target. Now, we get the pixels and coordinates of the moving object. This is how frame difference works.
for instance. Consider the following two frames in the video:
Can you see the difference between the two frames?
The position of the hand holding the pen changes from frame 1 to frame 2. The rest of the objects did not move at all. So, as I mentioned earlier, in order to locate the moving target, we will perform frame difference. The results were as follows:
You can see the highlight or white area, which is where the hand first appears. In addition, the edge of Notepad will also be highlighted. This may be because the movement of the hand changes the light. It is suggested that unnecessary detection of stationary objects should not be carried out. Therefore, we need to perform some image preprocessing steps on the frame.
In this method, the pixel value of the gray image is specified as one of the two values representing the black and white color according to the threshold value. Therefore, if the value of a pixel is greater than a threshold, it is given a value, otherwise it is assigned another value.
In this example, we will apply an image threshold to the output image of the frame difference in the previous step:
As you can see, most of the unwanted highlights have disappeared. The highlighted Notepad edge is no longer visible. The composite image can also be called a binary image because there are only two colors in it. In the next step, we’ll see how to capture these highlights.
Contours are used to identify the shape of areas in an image that have the same color or intensity. The contour is the boundary around the target area. Therefore, if we apply contour detection to the image after the threshold step, we will get the following results:
The white area is surrounded by light gray boundaries, which are the contours. We can easily get the coordinates of these contours. This means that we can get the location of the highlight.
Notice that there are multiple highlighted areas, each surrounded by a profile. In our example, the contour with the largest area is the area we expect. Therefore, it is best to have as few contours as possible.
In the image above, there are still some unnecessary fragments of white areas. There is still room for improvement. The idea is to merge nearby white areas to get less contours, so we can use another technique called image blotting.
This is the convolution operation of the image, in which the core (matrix) is transferred to the whole image. For your intuition, the image on the right is an enlarged version of the image on the left:
So, let’s inflate our image, and then we’ll find the contour again:
Here, we have only four candidate profiles, from which we can choose one with the largest area. You can also draw them on the original frame to see how the contours revolve around the moving target:
Building vehicle detection system with OpenCV and python
We are going to build our vehicle detection system! In this implementation, we will use the computer vision library opencv (version 4.0.0)（ https://www.analyticsvidhya.c… 。 Let’s first import the required libraries and modules.
import os import re import cv2 # opencv library import numpy as np from os.path import isfile, join import matplotlib.pyplot as plt
Import video frames
Please download the frames of the original video from this link.
Save the frame in a folder called frames in your working directory. From this folder, we will import frames and save them in the list:
# get file names of the frames col_frames = os.listdir('frames/') # sort file names col_frames.sort(key=lambda f: int(re.sub('\D', '', f))) # empty list to store the frames col_images= for i in col_frames: # read the frames img = cv2.imread('frames/'+i) # append the frames to the list col_images.append(img)
Let’s show two consecutive frames:
# plot 13th frame i = 13 for frame in [i, i+1]: plt.imshow(cv2.cvtColor(col_images[frame], cv2.COLOR_BGR2RGB)) plt.title("frame: "+str(frame)) plt.show()
It’s hard to find a difference between the two frameworks, isn’t it? As mentioned earlier, obtaining the difference between the pixel values of two consecutive frames will help us to observe the moving target. So let’s use this technique in the two frames above:
# convert the frames to grayscale grayA = cv2.cvtColor(col_images[i], cv2.COLOR_BGR2GRAY) grayB = cv2.cvtColor(col_images[i+1], cv2.COLOR_BGR2GRAY) # plot the image after frame differencing plt.imshow(cv2.absdiff(grayB, grayA), cmap = 'gray') plt.show()
Now we can clearly see the moving target in frames 13 and 14. Everything else that doesn’t move is subtracted.
Let’s see what happens when thresholds are applied to the above image:
diff_image = cv2.absdiff(grayB, grayA) # perform image thresholding ret, thresh = cv2.threshold(diff_image, 30, 255, cv2.THRESH_BINARY) # plot image after thresholding plt.imshow(thresh, cmap = 'gray') plt.show()
Now, moving objects (vehicles) look more like what we expect, and most of the noise (unwanted white areas) are gone. However, the highlighted areas are a bit fragmented. Therefore, we can apply image dilation to the image:
# apply image dilation kernel = np.ones((3,3),np.uint8) dilated = cv2.dilate(thresh,kernel,iterations = 1) # plot dilated image plt.imshow(dilated, cmap = 'gray') plt.show()
Moving objects have more solid highlights. You want no more than 3 contours per target in the frame.
However, we will not use the entire framework to detect moving vehicles. We will first select an area, if the vehicle enters the area, only that area is detected.
So let me show you the areas we’re going to use:
# plot vehicle detection zone plt.imshow(dilated) cv2.line(dilated, (0, 80),(256,80),(100, 0, 0)) plt.show()
The area below the horizontal line y = 80 is our vehicle detection area. We will only detect any movement in this area. You can also create your own detection area.
Now let’s find the contour in the detection area of the above frame:
# find contours contours, hierarchy = cv2.findContours(thresh.copy(),cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
The above code looks up all the contours in the entire image and saves them in the variable “contours.”. Since we only need to find the contour existing in the detection area, we will check the detected contour twice.
The first check is whether the Y coordinate of the upper left corner of the contour should be greater than or equal to 80 (I include another check here, the X coordinate is less than or equal to 200). Another check is that the area of the contour should be greater than or equal to 25. With the help of the CV2. Courtoarea() function, you can find the contour area.
valid_cntrs =  for i,cntr in enumerate(contours): x,y,w,h = cv2.boundingRect(cntr) if (x <= 200) & (y >= 80) & (cv2.contourArea(cntr) >= 25): valid_cntrs.append(cntr) # count of discovered contours len(valid_cntrs)
Next, let’s draw the outline and the original frame:
dmy = col_images.copy() cv2.drawContours(dmy, valid_cntrs, -1, (127,200,0), 2) cv2.line(dmy, (0, 80),(256,80),(100, 255, 255)) plt.imshow(dmy) plt.show()
That’s cool! Only the contour of the vehicle within the detection area is visible. This is how we detect the vehicle in the whole picture
Vehicle detection in video
It’s time to apply the same image transformation and preprocessing to all frames and find the desired contour. Again, we will follow these steps:
- Apply frame difference to each pair of consecutive frames
- Apply the image threshold to the output image of the previous step
- Enlarge the output image of the previous step
- Find the contour in the output image of the previous step
- Detection of candidate contours in the region
- Save frame and final outline
# kernel for image dilation kernel = np.ones((4,4),np.uint8) # font style font = cv2.FONT_HERSHEY_SIMPLEX # directory to save the ouput frames pathIn = "contour_frames_3/" for i in range(len(col_images)-1): # frame differencing grayA = cv2.cvtColor(col_images[i], cv2.COLOR_BGR2GRAY) grayB = cv2.cvtColor(col_images[i+1], cv2.COLOR_BGR2GRAY) diff_image = cv2.absdiff(grayB, grayA) # image thresholding ret, thresh = cv2.threshold(diff_image, 30, 255, cv2.THRESH_BINARY) # image dilation dilated = cv2.dilate(thresh,kernel,iterations = 1) # find contours contours, hierarchy = cv2.findContours(dilated.copy(), cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE) # shortlist contours appearing in the detection zone valid_cntrs =  for cntr in contours: x,y,w,h = cv2.boundingRect(cntr) if (x <= 200) & (y >= 80) & (cv2.contourArea(cntr) >= 25): if (y >= 90) & (cv2.contourArea(cntr) < 40): break valid_cntrs.append(cntr) # add contours to original frames dmy = col_images[i].copy() cv2.drawContours(dmy, valid_cntrs, -1, (127,200,0), 2) cv2.putText(dmy, "vehicles detected: " + str(len(valid_cntrs)), (55, 15), font, 0.6, (0, 180, 0), 2) cv2.line(dmy, (0, 80),(256,80),(100, 255, 255)) cv2.imwrite(pathIn+str(i)+'.png',dmy)
Prepare the video
Here, we add contours for all moving vehicles in all frames. Now it’s time to stack frames and create videos:
# specify video name pathOut = 'vehicle_detection_v3.mp4' # specify frames per second fps = 14.0
Next, we’ll read the last frame in the list:
frame_array =  files = [f for f in os.listdir(pathIn) if isfile(join(pathIn, f))] files.sort(key=lambda f: int(re.sub('\D', '', f)))
for i in range(len(files)): filename=pathIn + files[i] #read frames img = cv2.imread(filename) height, width, layers = img.shape size = (width,height) #inserting the frames into an image array frame_array.append(img)
Finally, we will use the following code to create a target detection video:
out = cv2.VideoWriter(pathOut,cv2.VideoWriter_fourcc(*'DIVX'), fps, size) for i in range(len(frame_array)): # writing to a image array out.write(frame_array[i]) out.release()
Congratulations on learning vehicle target detection!
In this tutorial, we learned how to use frame difference technology to perform moving object detection in video. We also discuss some concepts of target detection and image processing. Then we use OpenCV to build our own moving object detection system.
I am sure that using the techniques and methods learned in this article, you will build your own version of the target detection system.
Link to the original text: https://www.analyticsvidhya.c…
Welcome to visit pan Chuang AI blog station:
Sklearn machine learning Chinese official document:
Welcome to pay attention to pan Chuang blog resource collection station: