Important announcement: Missinglink has shut down. Click here to learn more.

Deep Learning Frameworks Cover

TensorFlow

TensorFlow MaxPool: Working with CNN Max Pooling Layers in TensorFlow

TensorFlow provides powerful tools for building, customizing and optimizing Convolutional Neural Networks (CNN) used to classify and understand image data. An essential part of the CNN architecture is the pooling stage, in which feature data collected in the convolution layers are downsampled or “pooled”, to extract their essential information.

 

It’s important to note that while pooling is commonly used in CNN, some convolutional architectures, such as ResNet, do not have separate pooling layers, and use convolutional layers to extract pertinent feature information and pass it forward.

 

In this page we explain how to use the MaxPool layer in Tensorflow, and how to automate and scale TensorFlow CNN experiments using the MissingLink deep learning platform.

Pooling Layers and their Role in CNN Image Classification

The purpose of pooling layers in CNN is to reduce or downsample the dimensionality of the input image. Pooling layers make feature detection independent of noise and small changes like image rotation or tilting. This property is known as “spatial variance.”

Pooling is based on a “sliding window” concept. It applies a statistical function over the values within a specific sized window, known as the convolution filter or kernel. There are three main types of pooling:

  • Max Pooling
  • Mean Pooling
  • Sum pooling

The most commonly used type is max pooling. Max Pooling take the maximum value within the convolution filter. The diagram below shows some max pooling in action.

max pooling in action

In the diagram above, the colored boxes represent a max pooling function with a sliding window (filter size) of 2×2. The simple maximum value is taken from each window to the output feature map. In other words, the maximum value in the blue box is 3. This value will represent the four nodes within the blue box. The same applies to the green and the red box.

Pooling in small images with a small number of features can help prevent overfitting. In large images, pooling can help avoid a huge number of dimensions. Optimization complexity grows exponentially with the growth of the dimension. Thus you will end up with extremely slow convergence which may cause overfitting.

 

The following image provides an excellent demonstration of the value of max pooling. In each image, the cheetah is presented in different angles.

value of max pooling

Source: SuperDataScience

 

Max pooling helps the convolutional neural network to recognize the cheetah despite all of these changes. After all, this is the same cheetah. Let’s assume the cheetah’s tear line feature is represented by the value 4 in the feature map obtained from the convolution operation.

 

neural network with the “spatial variance”

 

It doesn’t matter if the value 4 appears in a cell of 4 x 2 or a cell of 3 x1, we still get the same maximum value from that cell after a max pooling operation. This process is what provides the convolutional neural network with the “spatial variance” capability.

 


Using the tf.layers.MaxPooling function

The tf.layers module provides a high-level API that makes it easy to construct a neural network. It provides three methods for the max pooling operation:

  • layers.MaxPooling1D for 1D inputs
  • layers.MaxPooling2D for 2D inputs (e.g. images)
  • layers.MaxPooling3D for 3D inputs (e.g. volumes). To learn about tf.nn.max_pool(), which gives you full control over how the pooling layer is structured see the following section

 

Let’s review the arguments of the MaxPooling1D(), MaxPooling2D() and MaxPooling3D functions:

ArgumentUsage
pool_sizeAn integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to determine the same value for all spatial dimensions.
stridesAn integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.
paddingA string. The padding method, either ‘valid’ or ‘same’. Case-insensitive.
data_formatA string. The ordering of the dimensions in the inputs. channels_last (default) and channels_first are supported. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
nameA string, the name of the layer.

 

For all information see TensorFlow documentation


Using tf.nn.max_pool for Full Control Over the Pooling Layer

 

For all information see TensorFlow documentation

 

Using tf.nn.max_pool for Full Control Over the Pooling Layer

tf.nn.max_pool() is a lower-level function that provides more control over the details of the maxpool operation. Here is the full signature of the function:

 

tf.nn.max_pool(
    value,
    ksize,
    strides,
    padding,
    data_format='NHWC',
    name=None
)

 

Let’s review the arguments of the tf.nn.max_pool() function:

ArgumentUsage
valueA 4-D Tensor of the format specified by data_format.
ksizeA list or tuple of 4 integers. The size of the convolution filter for each dimension of the input tensor.
stridesA list or tuple of 4 integers. The stride of the convolution filter for each dimension of the input tensor.
paddingA string, either ‘VALID‘ or ‘SAME‘. The padding algorithm – VALID specifies padding of one pixel should be used, and SAME means the convolution filter does not go outside the boundaries of the image, resulting in a smaller input.
data_formatA string. ‘NHWC‘, ‘NCHW‘ and ‘NCHW_VECT_C‘ are supported.
nameOptional name for the pooling layer.

 

For all information see TensorFlow documentation.


Running CNN on TensorFlow in the Real World

In this article, we explained how to create a max pooling layer in TensorFlow, which performs downsampling after convolutional layers in a CNN model. When you start working on CNN projects and running large numbers of experiments, you’ll run into some practical challenges:

  • tracking experiments

    Tracking hyperparameters, metrics and experiment stats

    Over time you will run hundreds of thousands of experiments to find the CNN architecture and parameters that provide the best results. You will need to track all these experiments and find a way to record their findings and figure out what worked.

  • running experiment across multiple machines

    Running experiments across multiple machines

    Running CNN experiments, especially with large datasets, will require machines with multiple GPUs, or in many cases scaling across many machines. Provisioning these machines and distributing the work between them is not a trivial task.

  • manage training datasets

    Manage datasets

    CNN projects with images, video or other rich media can have massive training datasets weighing Gigabytes to Terabytes and more. Copying data to each training machine, and re-copying it every time you modify your datasets or run different experiments, can be very time-consuming.

 

MissingLink is a deep learning platform that does all of this for you, and lets you concentrate on building the most accurate model. Learn more to see how easy it is.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.