TensorFlow Image Segmentation: Two Quick Tutorials
TensorFlow lets you use deep learning techniques to perform image segmentation, a crucial part of computer vision. There are many ways to perform image segmentation, including Convolutional Neural Networks (CNN), Fully Convolutional Networks (FCN), and frameworks like DeepLab and SegNet.
This page will explain the basics of image segmentation and provide two quick tutorials from the community showing how to run some of them in TensorFlow.
In this page:
- Image segmentation in deep learning
- Quick Tutorial #1: Fully Convolutional Neural Network (FCN) for Semantic Segmentation with Pre-Trained VGG16 Model
- Quick Tutorial #2: Modifying the DeepLab code to train on your own dataset
Image segmentation involves dividing a visual input into segments to simplify image analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels”. Image segmentation sorts pixels into larger components. There are three levels of image analysis:
- Classification – categorizing the image into a class such as “people”, “animals”
- Object detection – detecting objects within an image and drawing a rectangle around them
- Segmentation – identifying parts of the image and understanding what object they belong to
There are two types of segmentation: semantic segmentation which classifies pixels of an image into meaningful classes, and instance segmentation which identifies the class of each object in the image.
The following deep learning techniques are commonly used to power image segmentation tasks:
- Convolutional Neural Networks (CNNs) – segments of an image can be fed as input to a CNN, which labels the pixels. The CNN cannot process the whole image at once. It scans the image, looking at a small “filter” of several pixels each time.
- Fully Convolutional Networks (FCNs) – FCNs use convolutional layers to process varying input sizes. The final output layer has a large receptive field and corresponds to the height and width of the image, while number of channels corresponds to number of classes. FCNs classify every pixel to determine image context and location of objects.
- DeepLab – an image segmentation framework that helps control signal decimation (reducing the number of samples and data the network must process), and aggregate features from images at different scales. DeepLab uses a ResNet architecture pre-trained on ImageNet for feature extraction. It uses a special technique called ASPP to process multi-scale information.
- SegNet neural network – an architecture based on deep encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding an input image into low dimensions and recovering it, leveraging orientation invariance in the decoder. This generates a segmented image at the decoder.
Scaling Up Image Segmentation Tasks on TensorFlow with MissingLink
If you’re working on image segmentation, you probably have a large dataset and need to run experiments on several machines. This can become challenging, and you might find yourself working hard on setting up machines, copying data and troubleshooting.
MissingLink is a deep learning platform that lets you effortlessly scale TensorFlow image segmentation across many machines, either on-premise or in the cloud. It also helps manage large data sets, view hyperparameters and metrics across your entire team on a convenient dashboard, and manage thousands of experiments easily.
Sign up for MissingLink free and see how easy it is.
The images below show the implementation of a fully convolutional neural network (FCN). Input for the net is the RGB image on the right. The net creates pixel-wise annotation as a matrix, proportionally, with the value of each pixel correlating with its class, see the image on the left.
The steps below are summarized, see the full instructions by Sagieppel.
- Set folder of the training images in
- Set folder for the ground truth labels in
- Download a pretrained VGG16 model and put in
- Set number of classes/labels in
- Run training script
2. Predicting pixelwise annotation using trained VGG network
- Set the Image_Dir to the folder where the input images for prediction are located.
- Set the number of classes in
- Set folder where you want the output annotated images to be saved to Pred_Dir
- Run script
3. Evaluating network performance using Intersection over Union (IOU)
- Set the
Image_Dirto the folder where the input images for prediction are located
- Set folder for ground truth labels in
Label_DIR. The Label Maps should be saved as PNG image with the same name as the corresponding image and png ending
- Set number of classes number in
- Run script
DeepLab is semantic image segmentation technique with deep learning, which uses an IMageNet pre-trained ResNet as its primary feature extractor network. The new ResNet block uses atrous convolutions, rather than regular convolutions.
Prerequisites: Before you begin, install one of the DeepLab implementations in TensorFlow. See TensorFlow documentation for more details.
The following is a summary of tutorial steps, for the full instructions and code see Beeren Sahu.
1. Preparing Dataset
Define what your dataset will be used for. Name your new dataset “PQR”. Create a folder “PQR” as:
Begin by inputting images and their pre-segmented images as ground-truth, for training. Segmented images should be color indexed images and input images should be color images. See the PASCAL dataset.
Create a folder named
PQR, with the following directory structure:
+ dataset -JPEGImages -SegmentationClass -ImageSets + tfrecord
2. Annotate input images
Use this folder for the semantic segmentation annotations images for the color input images. This is the ground truth for the semantic segmentation. Colour index these images. Every color index should correspond to a class (with a unique color) called a color map.
3. Define lists of images for training and validation
ImageSets folder, define:
Train.txt– list of image names for the training set
Val.txt– list of image names for the validation set
Trainval.txt– list of image names for training + validation set
4. Remove the color map in the ground truth annotations
If your segmentation annotation images are RGB images, you can use a Python script to do this:
import tensorflow as tf from PIL import Image from tqdm import tqdm import numpy as np import os, shutil # palette (color map) describes the (R, G, B): Label pair
The palette specifies the “RGB:LABEL” pair. In this sample code,
(0,0,0):0 is background and
(255,0,0):1 is the foreground class. Note, the
new_label_dir is where the raw segmentation data is kept.
The script converts the image dataset to a TensorFlow record. Create a new copy of the script file
The converted dataset will be saved at
5. Defining the dataset description
Open the file
segmentation_dataset.py present in the
research/deeplab/datasets/ folder. Add the code segment defining your PQR dataset description.
To train the model on your dataset, run the
train.py file in the
research/deeplab/ folder. The script
train-pqr.sh will do this automatically.
You can specify the number of training iterations in the variable
NUM_ITERATIONS, and set
— tf_initial_checkpoint to the location where you have downloaded or pre-trained the model and saved the *.ckpt files. The final trained model is in
Lastly, run this script from the
# sh ./train-pqr.sh
TensorFlow Image Segmentation in the Real World
In this article, we explained the basics of image segmentation with TensorFlow and provided two tutorials, which show how to perform segmentation using advanced models and frameworks like VGG16 and DeepNet. When you start working on real-life image segmentation projects, you’ll run into some practical challenges:
Tracking experiment source code, configuration, and hyperparameters. Image segmentation requires complex computer vision architectures and will often involve a lot of trial and error to discover the model that suits your needs. Organizing, tracking and sharing experiment data will become difficult over time.
Scaling up your experiments—image segmentation requires heavy CNN architectures like VGG and ResNet which might require days or weeks to run. The only way to run multiple experiments will be to scale up and out across multiple GPUs and machines. Setting up these machines and distributing the work between them is a serious challenge.
Manage training data—image segmentation involves large datasets. Copying these datasets to each training machine, then re-copying when you change project or fine tune the training examples, is time-consuming and error-prone. You need an automatic process that will prepare the required datasets on each training machine.