Building Faster R-CNN on TensorFlow: Introduction and Examples
The widespread adoption of Convolutional Neural Networks (CNNs) has driven progress in the computer vision and object detection field. Architectures such as Faster R-CNN, R-FCN, Multibox, SSD, and YOLO provide a framework for modern object detectors.
TensorFlow lets you build Faster R-CNN architectures to automatically recognize objects in images. Tensorflow has an official Object Detection API. This API provides implementations of object detection pipelines, including Faster R-CNN, with pre-trained models.
In this article:
- Overview of R-CNN Algorithms for Object Detection
- Typical Steps in a Faster R-CNN Approach on TensorFlow
- Scaling Up Faster R-CNN on TensorFlow with MissingLink
- Faster R-CNN TensorFlow Tutorial
- Running Faster-RCNN on TensorFlow in the Real World
Overview of R-CNN Algorithms for Object Detection
There are four types of R-CNN. Each type attempts to optimize, speed up, or enhance object detection results. Let’s compare these algorithms:
Composed of 2 steps:
● Selective search for region identification
● Extraction of CNN features from each region independently for classification
● Training is expensive and slow
● The process involves 3 separate models without much shared computation
● Extracts around 2000 regions from each image
● Cannot be implemented in real time as it takes around 47 seconds to run each test image
● Each image is passed only once to the CNN
● Feature maps are used to detect objects
● Uses a single R-CNN model
● Much faster than R-CNN in both training and testing time
● Selective search is slow and hence computation time is high
● Region proposals are generated separately using a different model. This makes the process very expensive
● Uses a unified model composed of RPN (region proposal network) and fast R-CNN with shared convolutional feature layers
● Object proposals with RPN are time-consuming
● The performance of the previous system affects the performance of the current system
● Applies Faster R-CNN to pixel-level image segmentation
● An additional branch is used in parallel with existing branches, to predict an object mask.
● Improves the RoI pooling layer so that RoI can be more precisely mapped to regions of the original image
The following is a general process many practitioners use to run the R-CNN algorithm on TensorFlow:
ConvNet produces a feature map of an image based on the input it receives about an image.
Region proposal network is applied to these feature maps. The ConvNet then returns the object proposals along with their object score.
A RoI pooling layer is applied to these proposals to produce a small feature map of fixed size.
The proposals are passed onto a fully connected layer, which includes a softmax layer and a linear regression layer. This process classifies and outputs the bounding boxes for objects.
If you’re working in the field of object detection, you probably have a large image dataset and need to run experiments on several machines. You might find yourself working hard setting up machines, copying data and managing experiment execution.
MissingLink is a deep learning platform that lets you scale Faster R-CNN TensorFlow object detection models across hundreds of machines, either on-premise or in the cloud. It also helps you view hyperparameters and metrics across your team, manage large data sets, and manage experiments easily.
In this tutorial, we will create an object detector using our own dataset. The model will be trained with a small number of images. To improve performance, you can train the model with a larger dataset. This tutorial is inspired by Vijendra Singh’s project.
1. Creating the dataset
- Choose an object you want to detect and take some photos of it. Use different backgrounds, angles, and distances.
- Transfer your images to a PC and resize them to a smaller size. That way training will go smoothly and you won’t run out of memory.
- Rename and separate the captured images into two folders. One folder for training (80%) and another for testing (20%).
- Label the training images using the labelImg library. LabelImg is a graphical image annotation tool written in Python.
- Labeling is done manually by drawing rectangles around objects and naming the labels.
2. Set up a TensorFlow Object Detection API Environment
Clone the Tensorflow object detection API:
git clone https://github.com/tensorflow/models.git
Change your present working directory to models/reserarch/ and add it to your python path:
3. Convert the data to TFRecord file format
Tensorflow Object Detection API uses the TFRecord file format. Tensorflow gives python script to convert Pascal VOC format dataset to Tensorflow record format. You have two options, either follow Pascal VOC dataset format or modify the Tesorflow script as needed.
4. Create a record file
From models/research as present working directory run the following command to create Tensorflow record:
python object_detection/dataset_tools/create_pascal_tf_record.py --data_dir=<path_to_your_dataset_directory> --annotations_dir=<name_of_annotations_directory> --output_path=<path_where_you_want_record_file_to_be_saved> --label_map_path=<path_of_label_map_file>
- Select a Faster R-CNN pre-trained model from Tensorflow detection model zoo. Tensorflow provides a collection of detection models pre-trained on the COCO dataset.
- Extract all files to the pre-trained model folder.
- Copy the file: models/research/object_detection/sample/configs/<your_model_name.config> intothe project repo.
- Configure 5 paths in this file.
- Run the following command with models/research as the present working directory:
python object_detection/legacy/train.py --train_dir=<path_to_the folder_for_saving_checkpoints> --pipeline_config_path=<path_to_config_file>
- Wait until the loss function is below 0.1, then interrupt via the keyboard.
- Generate inference graph from saved checkpoints
python object_detection/export_inference_graph.py --input_type=image_tensor --pipeline_config_path=<path_to_config_file> --trained_checkpoint_prefix=<path to saved checkpoint> --output_directory=<path_to_the_folder_for_saving_inference_graph>
In this article, we explained how to create and run Faster-RCNN models to perform object detection with TensorFlow. When you start working with Faster-RCNN projects and running large numbers of experiments, you’ll encounter practical challenges:
Tracking experiment progress, source code, and hyperparameters across multiple experiments. To find the optimal model you will have to run hundreds or thousands of experiments. It can be challenging to manage so many experiments.
Running experiments across multiple machines—for bidirectional Faster-RCNN is computationally intensive. Real projects will require running experiments on multiple machines and GPUs. Provisioning these machines and distributing the work among them will consume valuable time.
Managing training data—Object detection in images and video can have very large datasets. Moving data between training machines will take time and slow you down, especially when you are running multiple experiments.