Important announcement: Missinglink has shut down. Click here to learn more.

Neural Network Concepts Cover

Neural Network Concepts

Instance Segmentation with Deep Learning

Advancements in image recognition technologies are showing promise for many fields ranging from self-driving vehicles to medical diagnosis. Many companies are competing to develop the most comprehensive deep learning algorithm for object detection, capable of real-time object detection and classification with high accuracy.


Many of the companies rely on image segmentation techniques powered by Convolutional Neural Networks (CNNs), which form the basis of deep learning for computer vision. Image segmentation involves drawing the boundaries of the objects within an input image at the pixel level. This can help achieve object detection tasks in real-world scenarios and differentiate between multiple similar objects in the same image.


Semantic segmentation can detect objects within the input image, isolate them from the background and group them based on their class. Instance segmentation takes this process a step further and can detect each individual object within a cluster of similar objects, drawing the boundaries for each of them.


In this article, you will learn what is instance segmentation and how it works as a subtype of image segmentation and what makes it different from the other subtype of image segmentation, semantic segmentation. In addition, you will learn about different algorithms of instance segmentation and how they operate to achieve accurate object detection.

What is Instance Segmentation?

Instance segmentation is a subtype of image segmentation which identifies each instance of each object within the image at the pixel level. Instance segmentation, along with semantic segmentation, is one of two granularity levels of image segmentation.


What Is Image Segmentation?

Image segmentation is a computer vision process designed to simplify image analysis by splitting the visual input into segments that represent objects or parts of objects and form a collection of pixels or “super-pixels”. Image segmentation sorts pixels into larger components, which eliminates the need to consider each pixel as a unit of observation.


Object detection algorithms like YOLO  use bounding boxes to indicate the parts of the image that contain an object and then classify it. This restricts their capabilities as they do not provide any information about the shape of the object.


For many computer vision tasks, it is not enough to simply identify the object class. These tasks require image segmentation, which indicates the shape of the object, as well as how many times a certain object appears in the image.


Image segmentation allows a granular understanding of the objects within the image. Instead of saying a certain area has sheep, for example, image segmentation can delineate where each individual sheep ends and the next one begins.


Instance Segmentation vs Semantic Segmentation

There are two levels of granularity within the segmentation process:


  • Semantic segmentation—classifies objects features in the image and comprised of sets of pixels into meaningful classes that correspond with real-world categories.
  • Instance segmentation—identifies each instance of each object featured in the image instead of categorizing each pixel like in semantic segmentation. For example, instead of classifying five sheep as one instance, it will identify each individual sheep.

Instance Segmentation Deep Learning Networks

Instance segmentation is an important step to achieving a comprehensive image recognition and object detection algorithms. Companies like Facebook are investing many resources on the development of deep learning networks for instance segmentation to improve their users experience while also propelling the industry to the future.


Mask R-CNN

Mask Regional Convolutional Neural Network (R-CNN) is an extension of the faster R-CNN  object detection algorithm that adds extra features such as instance segmentation and an extra mask head. This allows us to form segments on the pixel level of each object and also separate each object from its background.


The framework of Mask R-CNN is based on two stages: first, it scans the image to generate proposals; which are areas with a high likelihood to contain an object. Second, it classifies these proposals and creates bounding boxes and masks.


Facebook AI Research for Instance Segmentation

The Facebook Artificial Intelligence (AI) Research (FAIR) team has designed techniques to identify and segment each object in image inputs for use in numerous object detection deep learning applications.


These techniques are called DeepMask, Sharpmask and MultiPathNet and they each serve a different purpose in the process. DeepMask and Sharpmask serves as the “eyes” of the algorithm and MultiPathNet as the “brain”.


  • DeepMask—can locate objects within input images, but cannot describe them and their boundaries.
  • Sharpmask—refines the output of DeepMask by adding higher-fidelity masks which improves the accuracy of object detection and boundaries.
  • MultiPathNet—takes the output of DeepMask and Sharpmask and classifies it.


Let’s think of these algorithms like a person looking at the sky and seeing an object. In this scenario, DeepMask is like that person with naked eyes. They can spot the object but are unable to identify it. Sharpmask is like a telescope they can use to identify the object as a bird. Finally, MultiPathNet serves as a guide they can use to classify which bird they see. Thus, instead of saying “it’s an object in the sky”, they can produce a much more definitive description: “it’s an albatross”.

How FAIR Algorithms Power Image Segmentation Methods

The FAIR algorithms, which build on deep learning convolutional neural networks, are designed for object detection tasks. They are able to find patterns in pixels and do object segmentation and classification.


  • Pattern identification—trains CNN networks to automatically learn patterns in pixels (such as shape and color) based on millions of inputs for generalization and classification of images.
  • Object segmentation—identifies objects within images using DeepMask and Sharpmask techniques to generate a mask prediction with high accuracy in terms of object presence and boundaries.
  • Object classification—classifies the output of DeepMask and Sharpmask by using MultiPathNet as the “brain” that recognizes the objects the “eyes” detected.


FAIR Applications

The FAIR algorithms have a wide range of potential applications for computer vision technology. For example, they can be used to allow computers to recognize objects in photos, which will make it easier to search for specific images without adding explicit tags to those photos. Additionally, it can help vision-impaired people interact with content on their computers.


One of the objectives of FAIR is to allow users who suffer from vision loss to understand the content of an image they were tagged in without relying on the caption of the image. Additionally, these algorithms can automatically prove caption suggestions for users who upload images by identifying and classifying the scenery for more detailed image description.

Instance Segmentation with Deep Learning in the Real World

In this article, we explained the basics of instance segmentation, a subtype of image segmentation and what it is used for. When you start to work on creating models designed for complex computer vision tasks that require instance segmentation for real-world scenarios, you’ll run into some practical challenges:

  • tracking experiments

    Tracking Experiments

    When running experiments with convolutional networks, you need to be able to track aspects such as experiment source code and hyperparameters. Tracking, organizing and sharing experiment data can be a challenge.

  • running experiment across multiple machines

    Scaling Experiments

    Deep learning experiments usually require a lot of computing power, especially if you need to run a large number of experiments across multiple machines. Purchasing the required equipment and setting it up for these experiments can be both time-consuming and expensive.

  • manage training datasets

    Storing Experiments

    Deep learning computer vision projects involve training CNN with multiple rich media sets like videos and images. Each dataset can weigh thousands of gigabytes or even petabytes. Managing these datasets and moving it between multiple machines can consume precious time and require that you allot many resources to these tasks.

MissingLink is a deep learning platform that can help you automate these operational elements of CNNs and computer vision, so you can concentrate on building winning image recognition experiments.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.