Deep Learning Frameworks

TensorFlow Image Segmentation: Two Quick Tutorials

TensorFlow lets you use deep learning techniques to perform image segmentation, a crucial part of computer vision. There are many ways to perform image segmentation, including Convolutional Neural Networks (CNN), Fully Convolutional Networks (FCN), and frameworks like DeepLab and SegNet.


This page will explain the basics of image segmentation and provide two quick tutorials from the community showing how to run some of them in TensorFlow.


In this page:

  • Image segmentation in deep learning
  • Quick Tutorial #1: Fully Convolutional Neural Network (FCN) for Semantic Segmentation with Pre-Trained VGG16 Model
  • Quick Tutorial #2: Modifying the DeepLab code to train on your own dataset

Image Segmentation in Deep Learning: Concepts and Techniques

Image segmentation involves dividing a visual input into segments to simplify image analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels”. Image segmentation sorts pixels into larger components. There are three levels of image analysis:

  • Classification – categorizing the image into a class such as “people”, “animals”
  • Object detection – detecting objects within an image and drawing a rectangle around them
  • Segmentation – identifying parts of the image and understanding what object they belong to


what is image segmentation?


There are two types of segmentation: semantic segmentation which classifies pixels of an image into meaningful classes, and instance segmentation which identifies the class of each object in the image.

semantic segmentation


The following deep learning techniques are commonly used to power image segmentation tasks:

  • Convolutional Neural Networks (CNNs)segments of an image can be fed as input to a CNN, which labels the pixels. The CNN cannot process the whole image at once. It scans the image, looking at a small “filter” of several pixels each time.
  • Fully Convolutional Networks (FCNs) – FCNs use convolutional layers to process varying input sizes. The final output layer has a large receptive field and corresponds to the height and width of the image, while number of channels corresponds to number of classes. FCNs classify every pixel to determine image context and location of objects.
  • DeepLab – an image segmentation framework that helps control signal decimation (reducing the number of samples and data the network must process), and aggregate features from images at different scales. DeepLab uses a ResNet architecture pre-trained on ImageNet for feature extraction. It uses a special technique called ASPP to process multi-scale information.
  • SegNet neural network – an architecture based on deep encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding an input image into low dimensions and recovering it, leveraging orientation invariance in the decoder. This generates a segmented image at the decoder.


Scaling Up Image Segmentation Tasks on TensorFlow with MissingLink

If you’re working on image segmentation, you probably have a large dataset and need to run experiments on several machines. This can become challenging, and you might find yourself working hard on setting up machines, copying data and troubleshooting.


MissingLink is a deep learning platform that lets you effortlessly scale TensorFlow image segmentation across many machines, either on-premise or in the cloud. It also helps manage large data sets, view hyperparameters and metrics across your entire team on a convenient dashboard, and manage thousands of experiments easily.


Sign up for MissingLink free and see how easy it is.


Quick Tutorial #1: FCN for Semantic Segmentation with Pre-Trained VGG16 Model

The images below show the implementation of a fully convolutional neural network (FCN). Input for the net is the RGB image on the right. The net creates pixel-wise annotation as a matrix, proportionally, with the value of each pixel correlating with its class, see the image on the left.

fully convolutional neural network (FCN) implementation

Source: TensorFlow


Begin by downloading a pre-trained VGG16 model here or here, and add the /Model_Zoo subfolder to the primary code folder.


The steps below are summarized, see the full instructions by Sagieppel.


1. Training


  1. Set folder of the training images in Train_Image_Dir
  2. Set folder for the ground truth labels in Train_Label_DIR
  3. Download a pretrained VGG16 model and put in model_path
  4. Set number of classes/labels in NUM_CLASSES
  5. Run training script


2. Predicting pixelwise annotation using trained VGG network



  1. Set the Image_Dir to the folder where the input images for prediction are located.
  2. Set the number of classes in NUM_CLASSES
  3. Set folder where you want the output annotated images to be saved to Pred_Dir
  4. Run script


3. Evaluating network performance using Intersection over Union (IOU)



  1. Set the Image_Dir to the folder where the input images for prediction are located
  2. Set folder for ground truth labels in Label_DIR. The Label Maps should be saved as PNG image with the same name as the corresponding image and png ending
  3. Set number of classes number in NUM_CLASSES
  4. Run script

Quick Tutorial #2: Modifying the DeepLab Code to Train on Your Own Dataset

DeepLab is semantic image segmentation technique with deep learning, which uses an IMageNet pre-trained ResNet as its primary feature extractor network. The new ResNet block uses atrous convolutions, rather than regular convolutions.


Prerequisites: Before you begin, install one of the DeepLab implementations in TensorFlow. See TensorFlow documentation for more details.


The following is a summary of tutorial steps, for the full instructions and code see Beeren Sahu.


1. Preparing Dataset

Define what your dataset will be used for. Name your new dataset “PQR”. Create a folder  “PQR” as: tensorflow/models/research/deeplab/datasets/PQR

Begin by inputting images and their pre-segmented images as ground-truth, for training. Segmented images should be color indexed images and input images should be color images. See the PASCAL dataset.

Create a folder named dataset inside PQR, with the following directory structure:

+ dataset
+ tfrecord


2. Annotate input images
Use this folder for the semantic segmentation annotations images for the color input images. This is the ground truth for the semantic segmentation. Colour index these images. Every color index should correspond to a class (with a unique color) called a color map.


3. Define lists of images for training and validation
In the ImageSets folder, define:

  • Train.txt – list of image names for the training set
  • Val.txt – list of image names for the validation set
  • Trainval.txt – list of image names for training + validation set


4. Remove the color map in the ground truth annotations

If your segmentation annotation images are RGB images, you can use a Python script to do this:


import tensorflow as tf
from PIL import Image
from tqdm import tqdm
import numpy as np

import os, shutil

# palette (color map) describes the (R, G, B): Label pair


The palette specifies the “RGB:LABEL” pair. In this sample code, (0,0,0):0 is background and (255,0,0):1 is the foreground class. Note, the new_label_dir is where the raw segmentation data is kept.

The script converts the image dataset to a TensorFlow record. Create a new copy of the script file ./dataset/ as ./dataset/
The converted dataset will be saved at ./deeplab/datasets/PQR/tfrecord


5. Defining the dataset description


Open the file present in the research/deeplab/datasets/ folder. Add the code segment defining your PQR dataset description.


6. Training

To train the model on your dataset, run the file in the research/deeplab/ folder. The script will do this automatically.

You can specify the number of training iterations in the variable NUM_ITERATIONS, and set  — tf_initial_checkpoint to the location where you have downloaded or pre-trained the model and saved the *.ckpt files. The final trained model is in TRAIN_LOGDIR directory.

Lastly, run this script from the …/research/deeplab directory:

# sh ./


TensorFlow Image Segmentation in the Real World

In this article, we explained the basics of image segmentation with TensorFlow and provided two tutorials, which show how to perform segmentation using advanced models and frameworks like VGG16 and DeepNet. When you start working on real-life image segmentation projects, you’ll run into some practical challenges:


tracking experiments

Tracking experiment source code, configuration, and hyperparameters. Image segmentation requires complex computer vision architectures and will often involve a lot of trial and error to discover the model that suits your needs. Organizing, tracking and sharing experiment data will become difficult over time.

running experiment across multiple machines

Scaling up your experiments—image segmentation requires heavy CNN architectures like VGG and ResNet which might require days or weeks to run. The only way to run multiple experiments will be to scale up and out across multiple GPUs and machines. Setting up these machines and distributing the work between them is a serious challenge.

manage training datasets

Manage training data—image segmentation involves large datasets. Copying these datasets to each training machine, then re-copying when you change project or fine tune the training examples, is time-consuming and error-prone. You need an automatic process that will prepare the required datasets on each training machine.

MissingLink is a deep learning platform that can help you automate these operational aspects of image segmentation on TensorFlow, so you can concentrate on building winning classification algorithms. Learn more to see how easy it is.

Learn More About Deep Learning Frameworks