TensorFlow MaxPool: Working with CNN Max Pooling Layers in TensorFlow
TensorFlow provides powerful tools for building, customizing and optimizing Convolutional Neural Networks (CNN) used to classify and understand image data. An essential part of the CNN architecture is the pooling stage, in which feature data collected in the convolution layers are downsampled or “pooled”, to extract their essential information.
It’s important to note that while pooling is commonly used in CNN, some convolutional architectures, such as ResNet, do not have separate pooling layers, and use convolutional layers to extract pertinent feature information and pass it forward.
In this article you will learn:
- What are pooling layers and their role in CNN image classification
- How to use tf.layers.maxpooling – code example and walkthrough
- Using nn.layers.maxpooling to gain more control over CNN pooling
Pooling Layers and their Role in CNN Image Classification
The purpose of pooling layers in CNN is to reduce or downsample the dimensionality of the input image. Pooling layers make feature detection independent of noise and small changes like image rotation or tilting. This property is known as “spatial variance.”
Pooling is based on a “sliding window” concept. It applies a statistical function over the values within a specific sized window, known as the convolution filter or kernel. There are three main types of pooling:
- Max Pooling
- Mean Pooling
- Sum pooling
The most commonly used type is max pooling. Max Pooling take the maximum value within the convolution filter. The diagram below shows some max pooling in action.
In the diagram above, the colored boxes represent a max pooling function with a sliding window (filter size) of 2×2. The simple maximum value is taken from each window to the output feature map. In other words, the maximum value in the blue box is 3. This value will represent the four nodes within the blue box. The same applies to the green and the red box.
Pooling in small images with a small number of features can help prevent overfitting. In large images, pooling can help avoid a huge number of dimensions. Optimization complexity grows exponentially with the growth of the dimension. Thus you will end up with extremely slow convergence which may cause overfitting.
The following image provides an excellent demonstration of the value of max pooling. In each image, the cheetah is presented in different angles.
Max pooling helps the convolutional neural network to recognize the cheetah despite all of these changes. After all, this is the same cheetah. Let’s assume the cheetah’s tear line feature is represented by the value 4 in the feature map obtained from the convolution operation.
It doesn’t matter if the value 4 appears in a cell of 4 x 2 or a cell of 3 x1, we still get the same maximum value from that cell after a max pooling operation. This process is what provides the convolutional neural network with the “spatial variance” capability.
Using the tf.layers.MaxPooling function
The tf.layers module provides a high-level API that makes it easy to construct a neural network. It provides three methods for the max pooling operation:
layers.MaxPooling1Dfor 1D inputs
layers.MaxPooling2Dfor 2D inputs (e.g. images)
layers.MaxPooling3Dfor 3D inputs (e.g. volumes). To learn about tf.nn.max_pool(), which gives you full control over how the pooling layer is structured see the following section
Let’s review the arguments of the MaxPooling1D(), MaxPooling2D() and MaxPooling3D functions:
|An integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to determine the same value for all spatial dimensions.|
|An integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.|
|A string. The padding method, either ‘valid’ or ‘same’. Case-insensitive.|
|A string. The ordering of the dimensions in the inputs. channels_last (default) and channels_first are supported. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).|
|A string, the name of the layer.|
For all information see TensorFlow documentation
Using tf.nn.max_pool for Full Control Over the Pooling Layer
tf.nn.max_pool() is a lower-level function that provides more control over the details of the maxpool operation. Here is the full signature of the function:
tf.nn.max_pool( value, ksize, strides, padding, data_format='NHWC', name=None )
Let’s review the arguments of the tf.nn.max_pool() function:
|value||A 4-D Tensor of the format specified by data_format.|
|ksize||A list or tuple of 4 integers. The size of the convolution filter for each dimension of the input tensor.|
|strides||A list or tuple of 4 integers. The stride of the convolution filter for each dimension of the input tensor.|
|padding||A string, either ‘|
|data_format||A string. ‘|
|name||Optional name for the pooling layer.|
For all information see TensorFlow documentation.
Running CNN on TensorFlow in the Real World
In this article, we explained how to create a max pooling layer in TensorFlow, which performs downsampling after convolutional layers in a CNN model. When you start working on CNN projects and running large numbers of experiments, you’ll run into some practical challenges:
Tracking hyperparameters, metrics and experiment stats—over time you will run hundreds of thousands of experiments to find the CNN architecture and parameters that provide the best results. You will need to track all these experiments and find a way to record their findings and figure out what worked.
Running experiments across multiple machines—running CNN experiments, especially with large datasets, will require machines with multiple GPUs, or in many cases scaling across many machines. Provisioning these machines and distributing the work between them is not a trivial task.
Manage datasets—CNN projects with images, video or other rich media can have massive training datasets weighing Gigabytes to Terabytes and more. Copying data to each training machine, and re-copying it every time you modify your datasets or run different experiments, can be very time-consuming.