Introduction
let’s understand about object detection and object tracking first.
Object detection: It is defined as the process of locating and recognizing things in a frame of an image or a video. In this process, the presence of numerous objects or items are determined in the input data in order to create bounding boxes around them to represent their locations.
Such as, bounding box coordinates that represent the position of the object, and then its class is also determined like “cat”, “crosswalk”, “bird”, etc. Even a confidence score to express the algorithm’s level of assurance regarding the discovery are all provided by object detection techniques.
It’s vital to keep in mind that object recognition is often done on individual frames or images and does not consider the movement or trajectory of objects across successive frames.
Object tracking: It is defined as following a specific object’s movement through a number of video frames. Basically, the motive is to accurately keep the tracked object’s identification constant even as it travels during the video. This technique is particularly useful in scenarios like surveillance, self-driving automobiles, and video analysis.
What is YOLO ?
YOLO simply stands for ‘You only look once’. It is an object identification technique that enables locating numerous objects within a video or an image in a single pass. YOLO works on dividing an image into a grid and simultaneously predicts bounding boxes, class probabilities, and confidence scores for each grid cell, in contrast to typical detection algorithms that repeatedly scan an image. A few examples of use cases of YOLOV5 are Face Mask Detection, Object Recognition, Speed calculator, Vehicle Tracker, and so on.
To know more about the YOLO Models refer to this link.
In this article, we will study how to use YOLOV5 for object tracking in videos.
The steps to be followed are :
Importing necessary libraries
Python3
import torch from IPython.display import Image, clear_output from IPython.display import HTML from base64 import b64encode |
Let’s now begin with cloning the required repository for this project.
Cloning Repository
So Initially we need to the Github repository of YOLOV5 using the below command.
Python3
!git clone - - recurse - submodules https: / / github.com / mikel - brostrom / Yolov5_DeepSort_Pytorch.git |
After this step, we need to change the directory according to the cloned repository.
Python3
% cd Yolov5_DeepSort_Pytorch |
Now, we will install the dependencies.
Python3
% pip install - qr requirements.txt |
Now, we will get some system information to run this model efficiently.
Python3
# clear the outpt clear_output() #system information print ( f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties( 0 ).name if torch.cuda.is_available() else 'CPU' })") |
Now, we will use a pre-trained YOLOv5 pre-trained model which is trained on Crowded human dataset.
Python3
# download the pre-trained model !wget - nc https: / / github.com / mikel - brostrom / Yolov5_DeepSort_Pytorch / releases / download / v. 2.0 / crowdhuman_yolov5m.pt - O / content / Yolov5_DeepSort_Pytorch / yolov5 / weights / crowdhuman_yolov5m.pt |
Now, let’s get a video and test it.
Python3
# getting test video !wget - nc https: / / github.com / mikel - brostrom / Yolov5_DeepSort_Pytorch / releases / download / v. 2.0 / test.avi |
After this step, we will extract just a few seconds of the starting portion of the video.
Python3
# extracting 2 seconds of video !y | ffmpeg - ss 00 : 00 : 00 - i test.avi - t 00 : 00 : 02 - c copy out.avi |
Now, let’s get the source video on which we want to do object tracking.
Python3
!python track.py - - yolo_model / content / Yolov5_DeepSort_Pytorch / yolov5 / weights / yolov5n.pt - - source out.avi - - save - vid |
In order to run object tracking on the video and display it, we first need to convert it to the MP4 format. We are using ‘ffmpeg’ for this task.
Python3
!ffmpeg - i / content / Yolov5_DeepSort_Pytorch / runs / track / exp3 / out.avi output.mp4 |
Now, to display the video, we are using HTML player. The HTML video element will first read the binary content of the MP4 video file and then encode it using base64, and then creates a data URL.
Python3
mp4 = open ( 'output.mp4' , 'rb' ).read() data_url = "data:video/mp4;base64," + b64encode(mp4).decode() # display with HTML HTML( """ <video controls> <source src="%s" type="video/mp4"> </video> """ % data_url) |
Output:
Conclusion :
Yolov5 is one of the best and efficient models for object detection and tracking and plays a significant in real-world applications such as- surveillance and security, autonomous vehicles like Tesla, Sports Analytics, etc. For more enhancement, we can also utilize this model for custom dataset training.