Deep Learning Frameworks

Building Faster R-CNN on TensorFlow: Introduction and Examples

The widespread adoption of Convolutional Neural Networks (CNNs) has driven progress in the computer vision and object detection field. Architectures such as Faster R-CNN, R-FCN, Multibox, SSD, and YOLO provide a framework for modern object detectors.


TensorFlow lets you build Faster R-CNN architectures to automatically recognize objects in images. Tensorflow has an official Object Detection API. This API provides implementations of object detection pipelines, including Faster R-CNN, with pre-trained models.


In this article:

Overview of R-CNN Algorithms for Object Detection

There are four types of R-CNN. Each type attempts to optimize, speed up, or enhance object detection results. Let’s compare these algorithms:






Composed of 2 steps:

●      Selective search for region identification

●      Extraction of CNN features from each region independently for classification

●      Training is expensive and slow

●      The process involves 3 separate models without much shared computation

●      Extracts around 2000 regions from each image

●      Cannot be implemented in real time as it takes around 47 seconds to run each test image

Fast R-CNN

●      Each image is passed only once to the CNN

●      Feature maps are used to detect objects

●      Uses a single R-CNN model

●      Much faster than R-CNN in both training and testing time

●      Selective search is slow and hence computation time is high

●      Region proposals are generated separately using a different model. This makes the process very expensive

Faster R-CNN

●      Uses a unified model composed of RPN (region proposal network) and fast R-CNN with shared convolutional feature layers

●      Object proposals with RPN are time-consuming

●      The performance of the previous system affects the performance of the current system

Mask R-CNN

●      Applies Faster R-CNN to pixel-level image segmentation

●      An additional branch is used in parallel with existing branches, to predict an object mask.

●      Improves the RoI pooling layer so that RoI can be more precisely mapped to regions of the original image



Running Faster R-CNN on TensorFlow: Typical Steps

The following is a general process many practitioners use to run the R-CNN algorithm on TensorFlow:



TensorFlow Documentation

ConvNet produces a feature map of an image based on the input it receives about an image.

Build a Convolutional Neural Network using Estimators

Region proposal network is applied to these feature maps. The ConvNet then returns the object proposals along with their object score.

Image segmentation with tf.keras

A RoI pooling layer is applied to these proposals to produce a small feature map of fixed size.


The proposals are passed onto a fully connected layer, which includes a softmax layer and a linear regression layer. This process classifies and outputs the bounding boxes for objects.




typical steps of faster r-cnn


Scaling Up Faster R-CNN on TensorFlow with MissingLink

If you’re working in the field of object detection, you probably have a large image dataset and need to run experiments on several machines. You might find yourself working hard setting up machines, copying data and managing experiment execution.


MissingLink is a deep learning platform that lets you scale Faster R-CNN TensorFlow object detection models across hundreds of machines, either on-premise or in the cloud. It also helps you view hyperparameters and metrics across your team, manage large data sets, and manage experiments easily.


Learn more about MissingLink

Faster R-CNN TensorFlow Tutorial: Object Detection Using the TensorFlow Object Detection API

In this tutorial, we will create an object detector using our own dataset. The model will be trained with a small number of images. To improve performance, you can train the model with a larger dataset. This tutorial is inspired by Vijendra Singh’s project.


1. Creating the dataset


  • Choose an object you want to detect and take some photos of it. Use different backgrounds, angles, and distances.
  • Transfer your images to a PC and resize them to a smaller size. That way training will go smoothly and you won’t run out of memory.
  • Rename and separate the captured images into two folders. One folder for training (80%) and another for testing (20%).
  • Label the training images using the labelImg library. LabelImg is a graphical image annotation tool written in Python.
  • Labeling is done manually by drawing rectangles around objects and naming the labels.

2. Set up a TensorFlow Object Detection API Environment


Clone the Tensorflow object detection API:


git clone


Change your present working directory to models/reserarch/ and add it to your python path:


export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim


3. Convert the data to TFRecord file format


Tensorflow Object Detection API uses the TFRecord file format. Tensorflow gives python script to convert Pascal VOC format dataset to Tensorflow record format. You have two options, either follow Pascal VOC dataset format or modify the Tesorflow script as needed.


4. Create a record file


From models/research as present working directory run the following command to create Tensorflow record:


python object_detection/dataset_tools/ --data_dir=<path_to_your_dataset_directory> --annotations_dir=<name_of_annotations_directory> --output_path=<path_where_you_want_record_file_to_be_saved> --label_map_path=<path_of_label_map_file>


5. Training


  • Select a Faster R-CNN pre-trained model from Tensorflow detection model zoo. Tensorflow provides a collection of detection models pre-trained on the COCO dataset.
  • Extract all files to the pre-trained model folder.
  • Copy the file: models/research/object_detection/sample/configs/<your_model_name.config> intothe project repo.
  • Configure 5 paths in this file.
  • Run the following command with models/research as the present working directory:


python object_detection/legacy/ --train_dir=<path_to_the folder_for_saving_checkpoints> --pipeline_config_path=<path_to_config_file>


  • Wait until the loss function is below 0.1, then interrupt via the keyboard.
  • Generate inference graph from saved checkpoints


python object_detection/ --input_type=image_tensor --pipeline_config_path=<path_to_config_file> --trained_checkpoint_prefix=<path to saved checkpoint> --output_directory=<path_to_the_folder_for_saving_inference_graph>


Running Faster-RCNN on TensorFlow in the Real World

In this article, we explained how to create and run Faster-RCNN models to perform object detection with TensorFlow. When you start working with Faster-RCNN projects and running large numbers of experiments, you’ll encounter  practical challenges:


tracking experiments

Tracking experiment progress, source code, and hyperparameters across multiple experiments. To find the optimal model you will have to run hundreds or thousands of experiments. It can be challenging to manage so many experiments.

running experiment across multiple machines

Running experiments across multiple machines—for bidirectional Faster-RCNN is computationally intensive. Real projects will require running experiments on multiple machines and GPUs. Provisioning these machines and distributing the work among them will consume valuable time.

manage training datasets

Managing training data—Object detection in images and video can have very large datasets. Moving data between training machines will take time and slow you down, especially when you are running multiple experiments.

MissingLink is a deep learning platform that can help you automate Faster-RCNN experiments on TensorFlow, so you can concentrate on building winning speech recognition experiments. Learn more to see how easy it is.

Learn More About Deep Learning Frameworks