Deep Learning Frameworks

TensorFlow MaxPool: Working with CNN Max Pooling Layers in TensorFlow

TensorFlow provides powerful tools for building, customizing and optimizing Convolutional Neural Networks (CNN) used to classify and understand image data. An essential part of the CNN architecture is the pooling stage, in which feature data collected in the convolution layers are downsampled or “pooled”, to extract their essential information.

It’s important to note that while pooling is commonly used in CNN, some convolutional architectures, such as ResNet, do not have separate pooling layers, and use convolutional layers to extract pertinent feature information and pass it forward.

In this article you will learn:

Pooling Layers and their Role in CNN Image Classification

The purpose of pooling layers in CNN is to reduce or downsample the dimensionality of the input image. Pooling layers make feature detection independent of noise and small changes like image rotation or tilting. This property is known as “spatial variance.”

Pooling is based on a “sliding window” concept. It applies a statistical function over the values within a specific sized window, known as the convolution filter or kernel. There are three main types of pooling:

  • Max Pooling
  • Mean Pooling
  • Sum pooling

The most commonly used type is max pooling. Max Pooling take the maximum value within the convolution filter. The diagram below shows some max pooling in action.

max pooling in action

In the diagram above, the colored boxes represent a max pooling function with a sliding window (filter size) of 2×2. The simple maximum value is taken from each window to the output feature map. In other words, the maximum value in the blue box is 3. This value will represent the four nodes within the blue box. The same applies to the green and the red box.

Pooling in small images with a small number of features can help prevent overfitting. In large images, pooling can help avoid a huge number of dimensions. Optimization complexity grows exponentially with the growth of the dimension. Thus you will end up with extremely slow convergence which may cause overfitting.


The following image provides an excellent demonstration of the value of max pooling. In each image, the cheetah is presented in different angles.

value of max pooling

Source: SuperDataScience


Max pooling helps the convolutional neural network to recognize the cheetah despite all of these changes. After all, this is the same cheetah. Let’s assume the cheetah’s tear line feature is represented by the value 4 in the feature map obtained from the convolution operation.


neural network with the “spatial variance”


It doesn’t matter if the value 4 appears in a cell of 4 x 2 or a cell of 3 x1, we still get the same maximum value from that cell after a max pooling operation. This process is what provides the convolutional neural network with the “spatial variance” capability.


Using the tf.layers.MaxPooling function

The tf.layers module provides a high-level API that makes it easy to construct a neural network. It provides three methods for the max pooling operation:

  • layers.MaxPooling1D for 1D inputs
  • layers.MaxPooling2D for 2D inputs (e.g. images)
  • layers.MaxPooling3D for 3D inputs (e.g. volumes). To learn about tf.nn.max_pool(), which gives you full control over how the pooling layer is structured see the following section


Let’s review the arguments of the MaxPooling1D(), MaxPooling2D() and MaxPooling3D functions:


pool_sizeAn integer or tuple/list of 2 integers: (pool_height, pool_width) specifying the size of the pooling window. Can be a single integer to determine the same value for all spatial dimensions.
stridesAn integer or tuple/list of 2 integers, specifying the strides of the pooling operation. Can be a single integer to specify the same value for all spatial dimensions.
paddingA string. The padding method, either ‘valid’ or ‘same’. Case-insensitive.
data_formatA string. The ordering of the dimensions in the inputs. channels_last (default) and channels_first are supported. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width).
nameA string, the name of the layer.


For all information see TensorFlow documentation


Using tf.nn.max_pool for Full Control Over the Pooling Layer

tf.nn.max_pool() is a lower-level function that provides more control over the details of the maxpool operation. Here is the full signature of the function:




Let’s review the arguments of the tf.nn.max_pool() function:


valueA 4-D Tensor of the format specified by data_format.
ksizeA list or tuple of 4 integers. The size of the convolution filter for each dimension of the input tensor.
stridesA list or tuple of 4 integers. The stride of the convolution filter for each dimension of the input tensor.
paddingA string, either ‘VALID‘ or ‘SAME‘. The padding algorithm – VALID specifies padding of one pixel should be used, and SAME means the convolution filter does not go outside the boundaries of the image, resulting in a smaller input.
data_formatA string. ‘NHWC‘, ‘NCHW‘ and ‘NCHW_VECT_C‘ are supported.
nameOptional name for the pooling layer.


For all information see TensorFlow documentation.

Running CNN on TensorFlow in the Real World

In this article, we explained how to create a max pooling layer in TensorFlow, which performs downsampling after convolutional layers in a CNN model. When you start working on CNN projects and running large numbers of experiments, you’ll run into some practical challenges:

tracking experiments

Tracking hyperparameters, metrics and experiment stats—over time you will run hundreds of thousands of experiments to find the CNN architecture and parameters that provide the best results. You will need to track all these experiments and find a way to record their findings and figure out what worked.

running experiment across multiple machines

Running experiments across multiple machines—running CNN experiments, especially with large datasets, will require machines with multiple GPUs, or in many cases scaling across many machines. Provisioning these machines and distributing the work between them is not a trivial task.

manage training datasets

Manage datasets—CNN projects with images, video or other rich media can have massive training datasets weighing Gigabytes to Terabytes and more. Copying data to each training machine, and re-copying it every time you modify your datasets or run different experiments, can be very time-consuming.

MissingLink is a deep learning platform that can help you automate these operational aspects of CNN on TensorFlow, so you can concentrate on building winning experiments. Learn more to see how easy it is.

Learn More About Deep Learning Frameworks