Deep Learning Frameworks

TensorFlow Image Recognition with Object Detection API: Tutorials

TensorFlow can help you build neural network models to automatically recognize images. These are typically Convolutional Neural Networks (CNN). There are two approaches to TensorFlow image recognition:

  • Classification—train the CNN to recognize categories like cats, dogs, cars, or anything else. The system classifies the image as a whole, based on these categories. See our in-depth guide on TensorFlow Image Classification.
  • Object Detection—more powerful than classification, it can detect multiple objects in the same image. It also tags the objects and shows their location within the image. In this article, we focus on the object detection approach in Tensorflow.


The Tensorflow Object Detection API is an open source framework built on top of TensorFlow that helps build, train and deploy object detection models. The API detects objects using ResNet-50 and ResNet-101 feature extractors trained on the iNaturalist Species Detection Dataset for 4 million iterations.


In this page we provide two quick tutorials which can help you learn the Object Detection API:

  • Quick Tutorial #1: Image recognition on a small dataset using MobileNet
  • Quick Tutorial #2: Image recognition with transfer learning on a pre-trained COCO modelQuick Tutorial #1: Chess Set Image Recognition Using MobileNet

The following tutorial steps are summarized. See the full tutorial by Justin Francis.


1. Gather a dataset

The dataset in this tutorial consists of images of chess pieces; only 75 images for each class. We’ll split the test files to 15%, instead of the typical 30% of data for testing. The original tutorial provides a handy script to download and resize images to 300×300 pixels, and sort them into train and test folders.

2. Create bounding boxes

You need the height, width and class of each image to train our object detection model. This includes the associated xmin, xmax, ymin, and ymax bounding boxes.

To help you create these labels, you can use software like LabelImg, an open source program that saves an XML label for each image. You can then convert them into a CSV table for training.


3. Install the Object Detection API

In this step, you need to clone the directory of TensorFlow Models directory and add a modifier to your Python path.

pip3 install -r requirements.txt
apt-get install -y protobuf-compiler 
git clone
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim


4. Convert labels to the TFRecord format

Use this code to create TFRecord files from your labels:

def xml_to_csv(path):
    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (root.find('filename').text,
	    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df

def main():
    for i in [trainPath, testPath]:
        image_path = i
        folder = os.path.basename(os.path.normpath(i))
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv('data/'+folder+'.csv', index=None)
        print('Successfully converted xml to csv.')



5. Select a model


The TensorFlow Object Detection API has various models: some continuously run the classifier, sliding filters of varying sizes across the image to detect objects, but these consume a lot of resources.


A faster option is the single shot detection (SSD) network, which detects video feeds at high FPS rates and simultaneously determines all the bounding box probabilities. However, SSD sacrifices accuracy for speed, so while it is useful as a bounding box framework, you should use a model like MobileNet for the neural network architecture.


6. Retrain the model

You don’t need a high-end GPU when retraining the last layer of your MobileNet model with your data, though it can speed up the process. To start training, you can run the `` file in the object detection API directory. When loss starts rising or remains constant around the value of 1, you can hit Ctrl+C  to stop TensorFlow training.


python3 models/research/object_detection/ --logtostderr --train_dir=data/ --pipeline_config_path=data/ssd_mobilenet_v1_pets.config


7. Implement the new model with TensorFlow

Export your graph for inference to start working with your newly trained model.

python3 models/research/object_detection/ \
    --input_type image_tensor \
    --pipeline_config_path data/ssd_mobilenet_v1_pets.config \
    --trained_checkpoint_prefix data/model.ckpt-997 \
    --output_directory object_detection_graph


You can now introduce new images to your model to test its performance. Assign about 10% of your new images as validation images.


Quick Tutorial #2: Transfer Learning Using COCO Pre-Trained Model

This tutorial outlines how you can build a model that classifies traffic lights as green, yellow, or red, using a pre-trained model based on the Common Objects in Context (COCO) dataset, using Faster-RCNN-Inception-V2.


Existing pre-trained models can identify traffic lights, but not the color. The tutorial also shows you how to extend the model to work on your own custom dataset.


The tutorial steps below are summarized – see the full tutorial by Daniel Stang.


1. Install model code


We are assuming you will run this model on a Jupyter Notebook. Ensure you have TensorFlow installed, and clone the Git repository by running:

git clone


Follow the instructions under “Add Libraries to PYTHONPATH” regardless of whether you have already installed TensorFlow. instructions. Find the models/research/ folder in the terminal console so you can enter the command to modify .bashrc files.


2. Choose a model

The simplest TensorFlow pre-trained model is the default model. Other models provide descriptions of their strengths and weaknesses, and the differences between them can be subtle.

Try out the models on some sample images to get a feel for them. If you can’t find the object you want to detect among the 90 COCO classes, you can test the model on a similar class. For example, substitute a cat for a squirrel.

To test the model, go to the g3doc folder and search the detection_model_zoo.mb file for the model file location, then enter the location into MODEL_NAME in your Jupyter notebook.


3. Define Labels

All labels should be in the TFRecord format. If the labels in your dataset are stored in .xml files, you can use the script to convert them into TFRecord files.


The Bosch dataset stores labels in a .yaml file, as follows:

- boxes:
  - {label: Green, occluded: false, x_max: 582.3417892052, x_min: 573.3726437481,
    y_max: 276.6271175345, y_min: 256.3114627642}
  - {label: Green, occluded: false, x_max: 517.6267821724, x_min: 510.0276868266,
    y_max: 273.164089267, y_min: 256.4279864221}
  path: ./rgb/train/2015-10-05-16-02-30_bag/720654.png
- boxes: []
  path: ./rgb/train/2015-10-05-16-02-30_bag/720932.png


There are two green lights in image 720654.png, and none in 720932.png. A single TFRecord file contains the whole dataset, including all the images and labels.


4. Create a TFRecord file


Have a look at TensorFlow’s sample script in the file The label and data from a single image, taken from a .yaml file, are used to create a TFRecord entry.


You can use the tensorflow.gifle.GFile() function to supply the encoded image data, which you need in addition to the class bounding box information. Use this information to populate all variables in the TFRecord entry.


After completing the create_tf_record() function, create a loop to call that function for every label in your dataset. Your training and evaluation datasets will likely be separate, so you have to make separate TFRecord files for them.


5. Create bounding boxes

You can annotate images easily with LabelImg. Define bounding boxes for traffic lights in the images and save the results to a CSV for training.


6. Model configuration file

COCO pre-trained models work with 90 classes. To modify a COCO model to work on your new dataset, with a different number of classes, you need to replace the last 90 classification layer of the network with a new layer.


For example, assume fc_2nd_last is the second-to-last fully connected layer in your network and nb_classes is the number of classes in your new dataset. Use this to replace the relevant layers:

shape = (fc_2nd_last.get_shape().as_list()[-1], nb_classes)
fc_last_W = tf.Variable(tf.truncated_normal(shape, stddev=1e-2))
fc_last_b = tf.Variable(tf.zeros(nb_classes))
logits = tf.nn.xw_plus_b(fc_2nd_last, fc_last_W, fc_last_b)


To do this with the object detection API, you only need to modify one line in the models config file. Navigate to object_detection/samples/configs where you cloned the TensorFlow models repository. This folder contains config files for every pre-trained model.

Place a copy of the config file for the model you selected in a new folder, where you will run the training. Create a folder for your TFRecord file within the new folder and label it “data”, and a second folder called “models” for the .ckpt files—three of them—of your pre-trained model. provides download links for each pre-trained model, and each download contains both a .pb file and a .ckpt file. Create a third folder called “train” under the “models” folder.


7. Modify the configuration file to match your custom dataset

Open the config file in a text editor and change the number of classes, located at the top, to match your dataset. Then point the fine_tune_checkpoint path to the model.ckpt file. This should look like:

fine_tune_checkpoint: "models/model.ckpt"


The num_steps parameter determines the number of training steps you will run. When you’ve started training you should track how long it takes to complete each training step so you can adjust num_steps.

You then need to adjust the input_path and label_map_path for the training and evaluation datasets. The input_path goes to the TFRecord file. You need to create a .pbtxt file so you can point the label_map_path to it. We need to create the file it’s supposed to point to. This file should contain the name and ID for each label in your dataset.


You can do this in any text file using the following format:

item {
  id: 1
  name: 'Green'
item {
  id: 2
  name: 'Red'


Start with id: 1, not 0, and match num_examples to the number of evaluation samples in your dataset.


8. Run training

Copy from the object_detection folder to your “training” folder. Navigate to this folder and execute the following command line to start training:

>python --logtostderr --train_dir=./models/train --pipeline_config_path=rfcn_resnet101_coco.config

9. Save a checkpoint model as a .pb file

Copy the file from the object detection folder to the folder with your model config file.

python --input_type image_tensor --pipeline_config_path ./rfcn_resnet101_coco.config --trained_checkpoint_prefix ./models/train/model.ckpt-5000 --output_directory ./fine_tuned_model

This creates a new directory fine_tuned_model containing your model, called frozen_inference_graph.pb.


10. Use the model for your project

The project outlined in this tutorial is a traffic light classifier. We implemented the classifier in Python as a class. We created a TensorFlow session in the initialization part of the class, so we don’t need to recreate it whenever we perform classification.

class TrafficLightClassifier(object):
    def __init__(self):
        PATH_TO_MODEL = 'frozen_inference_graph.pb'
        self.detection_graph = tf.Graph()
        with self.detection_graph.as_default():
            od_graph_def = tf.GraphDef()
            # Works up to here.
            with tf.gfile.GFile(PATH_TO_MODEL, 'rb') as fid:
                serialized_graph =
                tf.import_graph_def(od_graph_def, name='')
            self.image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
            self.d_boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
            self.d_scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
            self.d_classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
            self.num_d = self.detection_graph.get_tensor_by_name('num_detections:0')
        self.sess = tf.Session(graph=self.detection_graph)


We created a function that processes the image and identifies the bounding boxes, scores, and class of each object classified in the image:

def get_classification(self, img):
    with self.detection_graph.as_default():
         img_expanded = np.expand_dims(img, axis=0)  
        (boxes, scores, classes, num) =
            [self.d_boxes, self.d_scores, self.d_classes, self.num_d],
            feed_dict={self.image_tensor: img_expanded})
    return boxes, scores, classes, num


You may now wish to filter the results below the specified score threshold. This is fairly easy to do, as the model automatically sorts the results from highest to lowest score.

TensorFlow Image Recognition and Object Detection in the Real World

In this article, we provided two tutorials that illustrate how image recognition works in the TensorFlow Object Detection API. When you start working on real-life image recognition projects, you’ll run into some practical challenges:


tracking experiments

Tracking experiment source code, hyperparameters, and configuration. You’ll need to experiment with different model and CNN architectures to discover which one can deliver the best results. To do this you’ll need to run hundreds or thousands of experiments. You’ll find it is challenging to record, track and organize the results for all those experiments.

running experiment across multiple machines

Scaling up CNN experiments—most image classification models require fast GPUs to run, and you’ll want to scale and distribute experiments across multiple GPUs and eventually multiple machines, either on-premises or in the cloud. Provisioning these machines and ensuring the right experiments run on each one is a major burden.

manage training datasets

Manage training data—image recognition typically involves large datasets. Copying training images to each training machine, then re-copying them every time you fine tune your dataset, for example by changing image resolution or removing artifacts, will waste precious time you could spend optimizing your model.

MissingLink is a deep learning platform that can help you automate these operational aspects of image recognition and object detection on TensorFlow, so you can concentrate on building winning classification algorithms. Learn more.

Learn More About Deep Learning Frameworks