Neural Network Concepts Cover

Neural Network Concepts

Understanding a 3D CNN and Its Uses

There are several types of Convolutional Neural Networks (CNNs) being developed and all have the potential to greatly contribute to the speed and accuracy of automatic image identification. In particular, 3D CNNs are being created to improve the identification of moving and 3D images, such as video from security cameras and medical scans of cancerous tissue, a time-consuming process that currently requires expert analysis.

Development of 3D CNNs is still at an early stage due to their complexity, but the benefits they can deliver are worth educating yourself on. Read on to learn more how this new field takes deep learning for computer vision to a whole new level.

What Is a CNN?

CNN is a class of deep neural networks, which can be used in conjunction with a deep learning platform. A CNN is a network of processing layers used to reduce an image to its key features so that it can be more easily classified. The advantage of CNNs over other uses of classification algorithms is the ability to learn key characteristics on their own, reducing the need for hyperparameters, hand-engineered filters. These algorithms are increasingly being used for tasks such as facial recognition, image classification, video analysis, and automatic caption generation.

CNN Architecture

To maximize efficiency, a CNN operates in three layers:

Convolution Layer

This layer is where images are translated into processable data by kernels, a filter layer consisting of learned parameters. Each kernel filters for a different feature and multiple kernels are used in each analysis. In a convolution, small areas of an image are scanned and the probability that they belong to a filter class is assigned and translated to an activation map, a representation of the image layers. In a 3D CNN, the kernels move through three dimensions of data (height, length, and depth) and produce 3D activation maps.

Pooling Layer

Pooling, or downsampling, is done on the activation maps created during convolution. During pooling, a filter moves across an activation map evaluating a small section at a time, similar to the convolution process. This filter takes either the average of the scanned area, a weighted average based on the central pixel, or the max value and abstracts that value to a new map.


The maxpooling method, where the highest value from the scanned area is taken, is the most commonly used because it acts as a noise suppressant during compression. This abstraction is done to reduce the processing power needed to evaluate each map by eliminating unimportant features and allows for spatial variance, the ability to detect features regardless of rotation or tilting.

Fully Connected (FC) Layer

After multiple iterations, sometimes thousands, of convolution and pooling the output layers are flattened, the probabilities identified are analyzed, and the output is assigned a value, a logit. This analysis is done by the Fully Connected layer, in which each flattened output layer is processed by interconnected nodes, similar to a fully connected neural network (FCNN). The difference is that in a CNN the convolutional and pooling layers are independent of the FC layer. By isolating features of an image before feeding the output to the FC layer, CNN is able to restrict the need for higher processing power to the final steps.

Uses for 3D Convolutions

The 3D activation map produced during the convolution of a 3D CNN is necessary for analyzing data where temporal or volumetric context is important. This ability to analyze a series of frames or images in context has led to the use of 3D CNNs as tools for action recognition and evaluation of medical imaging.

Human action recognition

Action recognition is the process of analyzing the position of objects in a sequence of 2D images, like a video, and classifying it in the context of the surrounding frames to either interpret or predict object movement. Action recognition is being used in the development of assistive technologies, like smart homes, automation of surveillance or security systems and virtual reality applications, such as creating decentralized meeting spaces.


This process is complicated by the need to account for unrelated movement, like that of the camera or background objects, the pace at which 3D convolution can be done, and the lack of adequate datasets available for parameter modeling. Currently, a two-stream method, in which spatial and temporal data is analyzed independently at the convolutional and pooling layers and joined at the FC layer, shows the most promise.

Medical imaging

Similar to the way CNNs are being used to evaluate video, they can be used to analyze medical imaging, such as CT scans or MRI, for purposes of detection, diagnosis, and development of patient-specific devices. Currently, medical imaging is done by capturing slices of the depth of the tissue to be evaluated but because the body is made of 3D structures that move, all of the images must be viewed in context to be useful. By combining these static images with volume or spatial context, processes such as identification of cancerous cells, evaluation of arterial health, and structural mapping of brain tissue can be initially processed by a 3D CNN, reducing the time needed for human evaluation and allowing faster patient care.


A major hurdle for CNN use in medical practice is the difficulty of training due to the requirements of obtaining datasets: images must follow HIPAA guidelines for patient privacy, and be analyzed by experts as opposed to crowdsourcing, such as CAPTCHA. In an attempt to bypass these limitations, synthetic training data is being created through data augmentation and combined with small authentic datasets. This is possible because medical images often contain a wider variety of actionable information per image, allowing them to be used in training multiple kernels.

Managing 3D CNNs with MissingLink

As CNNs become more complex, so does the management of data and resources, but it can be made simpler through automation. MissingLink can help with this process through platform features that facilitate:

  • tracking experiments

    Tracking experiment progress, hyperparameters and source code across CNN experiments. Convolutional networks have numerous hyperparameters and require constant tweaking. Testing each of these will require running an experiment and tracking its results, and it’s easy to lose track of thousands of experiments across multiple teams.

  • running experiment across multiple machines

    Running experiments across multiple machines and GPUs—CNNs are computationally intensive and running multiple experiments on different data sets can take hours or days for each iteration. You’ll need to run experiments on multiple machines or GPUs, and you’ll find it is difficult to provision these machines, configure them, and distribute the work among them.

  • manage training datasets

    Manage training datasets—convolutional networks typically use media-rich datasets like images and video, which can weigh Gigabytes or more. In each experiment or each time you tweak the dataset, changing image size, rotating images, etc., you’ll need to re-copy the full dataset to the training machines. This is very time-consuming and error-prone.

With support for TensorFlow, Keras, Pycaffe, and PyTorch, MissingLink can simplify running your experiment on deep learning platforms, giving you the flexibility to select what works best for your design and allowing you to focus on what matters.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.