Working with CNN Max Pooling Layers in TensorFlow
Two Quick Tutorials
Introduction and Hands-On Tutorial
Building, Training and Scaling Residual Networks on TensorFlow
Two Quick Tutorials
Tutorials
Three Quick Tutorials
Three Quick Tutorials
Introduction and Tutorials
A Practical Guide
Building, Training and Scaling Residual Networks on PyTorch
Building, Training & Scaling Residual Nets on Keras
Working with CNN 2D Convolutions in Keras
Working with 1D Convolutional Neural Networks in Keras
Key Approaches and Tutorials
Introduction and Examples
Three Examples
TensorFlow provides powerful tools for building, customizing and optimizing Convolutional Neural Networks (CNN) used to classify and understand image data. An essential part of the CNN architecture is the pooling stage, in which feature data collected in the convolution layers are downsampled or “pooled”, to extract their essential information.
It’s important to note that while pooling is commonly used in CNN, some convolutional architectures, such as ResNet, do not have separate pooling layers, and use convolutional layers to extract pertinent feature information and pass it forward.
In this page we explain how to use the MaxPool layer in Tensorflow, and how to automate and scale TensorFlow CNN experiments using the MissingLink deep learning platform.
The purpose of pooling layers in CNN is to reduce or downsample the dimensionality of the input image. Pooling layers make feature detection independent of noise and small changes like image rotation or tilting. This property is known as “spatial variance.”
Pooling is based on a “sliding window” concept. It applies a statistical function over the values within a specific sized window, known as the convolution filter or kernel. There are three main types of pooling:
The most commonly used type is max pooling. Max Pooling take the maximum value within the convolution filter. The diagram below shows some max pooling in action.
In the diagram above, the colored boxes represent a max pooling function with a sliding window (filter size) of 2×2. The simple maximum value is taken from each window to the output feature map. In other words, the maximum value in the blue box is 3. This value will represent the four nodes within the blue box. The same applies to the green and the red box.
Pooling in small images with a small number of features can help prevent overfitting. In large images, pooling can help avoid a huge number of dimensions. Optimization complexity grows exponentially with the growth of the dimension. Thus you will end up with extremely slow convergence which may cause overfitting.
The following image provides an excellent demonstration of the value of max pooling. In each image, the cheetah is presented in different angles.
Source: SuperDataScience
Max pooling helps the convolutional neural network to recognize the cheetah despite all of these changes. After all, this is the same cheetah. Let’s assume the cheetah’s tear line feature is represented by the value 4 in the feature map obtained from the convolution operation.
It doesn’t matter if the value 4 appears in a cell of 4 x 2 or a cell of 3 x1, we still get the same maximum value from that cell after a max pooling operation. This process is what provides the convolutional neural network with the “spatial variance” capability.
The tf.layers module provides a high-level API that makes it easy to construct a neural network. It provides three methods for the max pooling operation:
layers.MaxPooling1D
for 1D inputslayers.MaxPooling2D
for 2D inputs (e.g. images)layers.MaxPooling3D
for 3D inputs (e.g. volumes). To learn about tf.nn.max_pool(), which gives you full control over how the pooling layer is structured see the following section
Let’s review the arguments of the MaxPooling1D(), MaxPooling2D() and MaxPooling3D functions:
Argument | Usage |
pool_size | An integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to determine the same value for all spatial dimensions. |
strides | An integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions. |
padding | A string. The padding method, either ‘valid’ or ‘same’. Case-insensitive. |
data_format | A string. The ordering of the dimensions in the inputs. channels_last (default) and channels_first are supported. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width). |
name | A string, the name of the layer. |
For all information see TensorFlow documentation
For all information see TensorFlow documentation
tf.nn.max_pool()
is a lower-level function that provides more control over the details of the maxpool operation. Here is the full signature of the function:
tf.nn.max_pool( value, ksize, strides, padding, data_format='NHWC', name=None )
Let’s review the arguments of the tf.nn.max_pool() function:
Argument | Usage |
value | A 4-D Tensor of the format specified by data_format. |
ksize | A list or tuple of 4 integers. The size of the convolution filter for each dimension of the input tensor. |
strides | A list or tuple of 4 integers. The stride of the convolution filter for each dimension of the input tensor. |
padding | A string, either ‘VALID ‘ or ‘SAME ‘. The padding algorithm – VALID specifies padding of one pixel should be used, and SAME means the convolution filter does not go outside the boundaries of the image, resulting in a smaller input. |
data_format | A string. ‘NHWC ‘, ‘NCHW ‘ and ‘NCHW_VECT_C ‘ are supported. |
name | Optional name for the pooling layer. |
For all information see TensorFlow documentation.
In this article, we explained how to create a max pooling layer in TensorFlow, which performs downsampling after convolutional layers in a CNN model. When you start working on CNN projects and running large numbers of experiments, you’ll run into some practical challenges:
Over time you will run hundreds of thousands of experiments to find the CNN architecture and parameters that provide the best results. You will need to track all these experiments and find a way to record their findings and figure out what worked.
Running CNN experiments, especially with large datasets, will require machines with multiple GPUs, or in many cases scaling across many machines. Provisioning these machines and distributing the work between them is not a trivial task.
CNN projects with images, video or other rich media can have massive training datasets weighing Gigabytes to Terabytes and more. Copying data to each training machine, and re-copying it every time you modify your datasets or run different experiments, can be very time-consuming.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster