Important announcement: Missinglink has shut down. Click here to learn more.
Don’t Think Twice
Making the Future Clearer
Methods, Best Practices, Applications
Methods and Applications
Object detection algorithms enable many advanced technologies and are a primary research focus for many industries from transportation to healthcare. For example, a common use for object detection algorithms is to implement them in sensors, such as Lidar, in the systems of autonomous cars to enable self-driving.
There are several object detection algorithms with different capabilities. These algorithms are mostly split into two groups according to how they perform their tasks.
The first group is composed of algorithms based on classification and work in two stages. First, they select the interesting parts of the image, and then they classify objects within those regions using Convolutional Neural Networks (CNN) . This group, which includes solutions such as R-CNN, is usually too slow to be applied in real-time situations.
The algorithms in the second group are based on regression━they scan the whole image and make predictions to localize, identify and classify objects within the image. Algorithms in this group, such as You Only Look Once (YOLO), are faster and can be used for real-time object detection.
If you want to train a deep learning algorithm for object detection, you need to understand the different solutions available to you and know which one better suits your needs. Read this article to learn why YOLO is a better overall solution for real-time object detection, how it operates and how you can use MissingLink to train it more efficiency.
You Only Look Once (YOLO) is a network that uses Deep Learning (DL) algorithms for object detection. YOLO performs object detection by classifying certain objects within the image and determining where they are located on it.
For example, if you input an image of a herd of sheep into a YOLO network, it will generate an output of a vector of bounding boxes for each individual sheep and classify it as such.
How YOLO improves over previous object detection methods-
Previous object detection methods like Region-Convolutional Neural Networks (R-CNN), including other variations of it like fast R-CNN, performed object detection tasks in a pipeline of multi-step series. R-CNN focuses on a specific region within the image and trains each individual component separately.
This process requires the R-CNN to classify 2000 regions per image, which makes it very time-consuming (47 seconds per individual test image). Thus it, cannot be implemented in real-time. Additionally, R-CNN usees a fixed selective algorithm, which means no learning process occurs during this stage so the network might generate an inferior region proposal.
This makes object detection networks such as R-CNN harder to optimize and slower compared to YOLO. YOLO is much faster (45 frames per second) and easier to optimize than previous algorithms, as it is based on an algorithm that uses only one neural network to run all components of the task.
To gain a better understanding of what YOLO is, we first have to explore its architecture and algorithm.
A YOLO network consists of three main parts. First, the algorithm, also known as the predictions vector. Second, the network. Third, the loss function.
Once you insert input an image into a YOLO algorithm, it splits the images into an SxS grid that it uses to predict whether the specific bounding box contains the object (or parts of it) and then uses this information to predict a class for the object.
Before we can go into details and explain how the algorithm functions, we need to understand how the algorithm builds and specifies each bounding box. The YOLO algorithm uses four components and additional value to predict an output.
The final predicted value is confidence (pc). It represents the probability of the existence of an object within the bounding box.
The (x,y) coordinates represent the center of the bounding box.
Typically, most of the bounding boxes will not contain an object, so we need to use the pc prediction. We can use a process called non-max suppression to remove unnecessary boxes with low probability to contain objects and those who share big areas with other boxes.
A YOLO network is structured like a regular CNN, it contains convolutional and max-pooling layers and then two fully connected CNN layers.
We only want one of the bounding boxes to be responsible for the object within the image since the YOLO algorithm predicts multiple bounding boxes for each grid cell. To achieve this, we use the loss function to compute the loss for each true positive. To make the loss function more efficient, we need to select the bounding box with the highest Intersection over Union (IoU) with the ground truth. This method improves predictions by making specialized bounding boxes which improves the predictions for some aspect ratios and sizes.
The most current version of YOLO is the third iteration of the object detection network. The creators of YOLO designed new versions so to make improvements over previous versions, mostly focusing on improving the detection accuracy.
The first version of YOLO was introduced in 2015, it used a limited Darknet framework that trained on ImageNet-1000 dataset. This dataset has many limitations and restricts the usability of YOLO V1. Namely, YOLO V1 struggled to identify small objects that appeared as a cluster and was inefficient at generalizing objects in images that had different dimensions than the trained image. This resulted in poor localization of objects within the input image.
YOLO V2 was released in 2016 with the name YOLO9000. YOLO V2 used darknet-19, a 19-layer network with 11 more layers charged with object detection. YOLO V2 is designed to take on the Faster R-CNN and Single Shot multi-box Detector (SSD) which showed better object detection scores.
YOLO V2 upgrades over YOLO V1 include:
YOLO V3 is an incremental upgrade over YOLO V2, which uses another variant of Darknet. This YOLO V3 architecture consists of 53 layers trained on Imagenet and another 53 tasked with object detection which amounts to 106 layers. While this has dramatically improved the accuracy of the network, it has also reduced the speed from 45 fps to 30 fps.
YOLO V3 upgrades over YOLO V2 include:
In this article, we explained the basics of You Only Look Once (YOLO), what it is used for and compared it against other object detection algorithms. Training and running a YOLO model for object detection requires running multiple experiments which can be highly demanding in terms of running and tracking these experiments. These challenges, especially in unoptimized models, can lead to lead to inflated storage and hardware costs and time consumption.
MissingLink is a deep learning platform that can help you optimize and automate operational elements of CNNs and computer vision, so you can concentrate on building winning image recognition experiments.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster