Neural Network Concepts Cover

Convolutional Neural Networks

Convolutional Neural Network Architecture: Forging Pathways to the Future

Advances in AI and deep learning have enabled the rapid evolution in the fields of computer vision and image analysis. This is all made possible by the emergence and progress of Convolutional Neural Networks (CNNs). Read on to learn more about what is a CNN, how many layers are used in a CNN and what is their purpose. Additionally, we will cover the ImageNet challenge and how it helped shape the most popular CNN architectures, and how MissingLink can help you train your own convolutional neural network with better efficiency.

What Is a Convolutional Neural Network?

A Convolutional Neural Network (CNN) is a deep learning algorithm that can recognize and classify features in images for computer vision. It is a multi-layer neural network designed to analyze visual inputs and perform tasks such as image classification, segmentation and object detection, which can be useful for autonomous vehicles. CNNs can also be used for deep learning applications in healthcare, such as medical imaging.

There are two main parts to a CNN:

  • A convolution tool that splits the various features of the image for analysis
  • A fully connected layer that uses the output of the convolution layer to predict the best description for the image.

link to deep learning in healthcare article

Basic Convolutional Neural Network Architecture

CNN architecture is inspired by the organization and functionality of the visual cortex and designed to mimic the connectivity pattern of neurons within the human brain.

The neurons within a CNN are split into a three-dimensional structure, with each set of neurons analyzing a small region or feature of the image. In other words, each group of neurons specializes in identifying one part of the image. CNNs use the predictions from the layers to produce a final output that presents a vector of probability scores to represent the likelihood that a specific feature belongs to a certain class.

How a Convolutional Neural Network Works━The CNN layers

A CNN is composed of several kinds of layers:

  • Convolutional layer━creates a feature map to predict the class probabilities for each feature by applying a filter that scans the whole image, few pixels at a time.
  • Pooling layer (downsampling)━scales down the amount of information the convolutional layer generated for each feature and maintains the most essential information (the process of the convolutional and pooling layers usually repeats several times).
  • Fully connected input layer—“flattens” the outputs generated by previous layers to turn them into a single vector that can be used as an input for the next layer.
  • Fully connected layer—applies weights over the input generated by the feature analysis to predict an accurate label.
  • Fully connected output layer━generates the final probabilities to determine a class for the image.

Popular Convolutional Neural Network Architectures

The architecture of a CNN is a key factor in determining its performance and efficiency. The way in which the layers are structured, which elements are used in each layer and how they are designed will often affect the speed and accuracy with which it can perform various tasks.

The ImageNet Challenge

The ImageNet project is a visual database designed for use in the research of visual object recognition software. The ImageNet project has more than 14 million images specifically designed for training CNN in object detection, one million of which also provide bounding boxes for the use of networks such as YOLO.

Since 2010, the project hosts an annual contest called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). The contenders of the contest build software programs that attempt to correctly detect and classify objects and scenes within the given images. Currently, the challenge uses a cut down list of a thousand separate classes.

When the annual ILSVRC competition began, a good classification rate was 25%, the first major leap in performance was achieved by a network called AlexNet in 2012, which dropped the classification rate by 10%. Over the next years, the error rates dropped to lower percentages and finally exceeded human capabilities.

Popular architectures - the differences in architecture and their advantages:

There are many popular CNN architectures, many of them gained recognition by achieving good results at the ILSVRC.

LeNet-5 (1998)

This 7-layer CNN classified digits, digitized 32×32 pixel greyscale input images. it was used by several banks to recognize the hand-written numbers on checks.

LeNet Convolutional Neural Network Architecture

AlexNet (2012)

AlexNet is designed by SuperVision group, with a similar architecture to LeNet, but deeper━it has more filters per layer as well as stacked convolutional layers. It is composed of five convolutional layers followed by three fully connected layers.

One of the most significant differences between AlexNet and other object detection algorithms is the use of ReLU for the non-linear part instead of Sigmond function or Tanh like traditional neural networks. AlexNet leverages ReLU’s faster training to make their algorithm faster.

The creators of AlexNet split their network into two pipelines because they used two Nvidia Geforce GTX 580 Graphics Processing Units (GPUs) to train their CNN.

AlexNet Convolutional Neural Network Architecture

GoogleNet (2014)

Built with a CNN inspired by LetNet, the GoogleNet network, which is also named Inception V1, was made by a team at Google. GoogleNet was the winner of ILSVRC 2014 and achieved a top-5 error rate of less than 7%, which is close to the level of human performance.

GoogleNet architecture consisted of a 22 layer deep CNN used a module based on small convolutions, called “inception module”, which used batch normalization, RMSprop and image to reduce the number of parameters from 60 million like in AlexNet to only 4 million.

googlenet cnn architecture

VGGNet (2014)

VGGNet, the runner-up at the ILSVRC 2014, consisted of 16 convolutional layers. Similar to AlexNet, it used only 3×3 convolutions but added more filters. VGGNet trained on 4 GPUs for more than two weeks to achieve its performance.

The problem with VGGNet is that it consists of 138 million parameters, 34.5 times more than GoogleNet, which makes it challenging to run.

VGGNet Convolutional Neural Network Architecture


Running Convolutional Neural Networks with MissingLink

In this article, we explained the basics of Convolutional Neural Networks and examined a few popular CNN architectures and how the ImageNet challenge helped in shaping them.

If you plan to build and train your own network, you will likely run into a few challenges. For example, running a CNN can be a very intensive task and might require strong computational power to run such as multiple GPUs that can cost hundreds and thousands of dollars.

Additionally, training a CNN typically requires running multiple experiments with many hyperparameters that require your attention and it might pose a challenge to keep track of them all.

These challenges and many others can be far more manageable with the help of MissingLink. MissingLink is a deep learning platform that can help you automate these operational aspects of CNN, so you can concentrate on building winning experiments. Learn more to see how easy it is.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.