Deep Learning Frameworks

Keras Conv2D: Working with CNN 2D Convolutions in Keras

This article explains how to create 2D convolutional layers in Keras, as part of a Convolutional Neural Network (CNN)  architecture.


A CNN architecture has three main parts:

  • A convolutional layer that extracts features from a source image. This is the essential feature of a CNN, which works on parts of the image each time, instead of feeding all the input to each layer of the network.
  • A pooling layer that downsamples each feature to reduce its dimensionality and focus on the most important elements. There are several rounds of convolution and pooling; and in some CNN architectures, there may be hundreds or thousands.
  • A fully connected layer that flattens the features identified in the previous layers into a vector, and applies a traditional neural network with all neurons in each layer connected to all neurons in the next layer, to make a prediction about the image.


CNN process in TensorFlow or Keras


In Keras, you build a CNN architecture using the following process:

1. Reshape the input data into a format suitable for the convolutional layers, using X_train.reshape() and X_test.reshape()

2. For class-based classification, one-hot encode the categories using the to_categorical() function.

3. Build the model using the Sequential.add() function. For a 2D convolutional layer, the command looks like the following.


model.add(Conv2D(64, kernel_size=3, activation=’relu’, input_shape=(28,28,1)))


>> You are here. In this article, we explain how to work with 2D convolutional layers in Keras.     


4. Add a pooling layer, for example using the Sequential.add(MaxPooling2D()) function – not showing all parameters.

5. Add a “flatten” layer which prepares a vector for the fully connected layers, using Sequential.add(Flatten()).

6. Add one or more fully connected layer using Sequential.add(Dense)). Typically you will follow each fully connected layer with a dropout layer (learn more about dropout in our guide to neural network hyperparameters ), using Sequential.add(Dropout)).

7. Compile the model using model.compile()

8. Train the model using, supplying X_train() and X_test() which are the source images; y_train() and y_test() which are known classification results.

9. Use model.predict() to generate a prediction.


In this article you will learn:


What is a 2D Convolution Layer, the Convolution Kernel and its Role in CNN Image Classification

Briefly, some background. A convolution layer “scans” A source image with a filter of, for example, 5×5 pixels, to extract features which may be important for classification. This filter is also called the convolution kernel. The kernel also contains weights, which are tuned in the training of the model to achieve the most accurate predictions.


In a 5×5 kernel, for each 5×5 pixel region, the model computes the dot products between the image pixel values and the weights defined in the filter.


A 2D convolution layer means that the input of the convolution operation is three-dimensional, for example, a color image which has a value for each pixel across three layers: red, blue and green. However, it is called a “2D convolution” because the movement of the filter across the image happens in two dimensions. The filter is run across the image three times, once for each of the three layers.



After the convolution ends, the features are downsampled, and then the same convolutional structure repeats again. At first, the convolution identifies features in the original image (for example in a cat, the body, legs, tail, head), then it identifies sub-features within smaller parts of the image (for example, within the head, the ears, whiskers, eyes). Eventually, this process is meant to identify the essential features that can help classify the image. Learn more in our guide to Convolutional Neural Networks.


Keras CNN example and Keras Conv2D

Here is a simple code example to show you the context of Conv2D in a complete Keras model. The example was created by Andy Thomas. This model has two 2D convolutional layers, highlighted in the code.


# building the mode
model = Sequential()
model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1),
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(64, (5, 5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(1000, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

# training
              metrics=['accuracy']), y_train,
          validation_data=(x_test, y_test),

# evaluating and printing results
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Understanding and Tuning the Parameters of Keras Conv2d [3]

When adding a Conv2D layer using Sequential.model.add(), there are numerous parameters you can use, as defined in the underlying keras.layers.conv2D() function (see documentation).


Here is the full signature of the Keras Conv2D function:


keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


Below we explain each of these parameters, what it does, and some best practices for setting and tuning it. To get more background about tuning neural networks, see our guide on neural network hyperparameters.


Keras Conv2D ParameterWhat it DoesBest Practices and Tuning
filtersSets the number of filters used in the convolution operation.Earlier 2D convolutional layers, closer to the input, learn less filters, while later convolutional layers, closer to the output, learn more filters. The number of filters you select should depend on the complexity of your dataset and the depth of your neural network. A common setting to start with is [32, 64, 128] for three layers, and if there are more layers, increasing to [256, 512, 1024], etc.
kernel_sizeSpecifies the size of the convolutional filter in pixels. Must be an odd integer.Filter size may be determined by the CNN architecture you are using – for example VGGNet exclusively uses (3, 3) filters. If not, use a 5×5 or 7×7 filter to learn larger features and then quickly reduce to 3×3. If your images are smaller than 128×128, consider working with smaller filters of 1×1 and 3×3.
strides=(1, 1)The strides parameter is a 2-tuple of integers, specifying how the convolutional filter should “step” along the x and y-axis of the source image.In most cases, it’s okay to leave the strides parameter with the default (1, 1). However, you may increase it to (2, 2) to reduce the size of the output volume.
padding='valid'The padding parameter has two values: valid or same. Valid means the input is not zero-padded, so the output of the convolution will be smaller than the dimensions of the original image. Same means the input will be zero-padded, so the convolution output can be the same size as the input.The default Keras value is valid, but it is often effective to set it to same for most of the layers, then reduce spatial dimensions using max pooling or strided convolutions.
data_format=NoneSpecifies the order of data in the input received from the backend deep learning framework: channels_last or channels_firstThe TensorFlow backend to Keras uses channels last ordering. Do not change this parameter unless you are using Theano as your backend.


dilation_rate=(1, 1)A 2-tuple of integers, controlling the dilation rate for dilated convolution. Dilated convolution is a convolution applied to the input volume with defined gaps (the filter does not scan the entire image, skipping certain segments).Dilated convolutions are useful for working with higher resolution images, but wanting to still focus on fine-grained details, or when constructing a network with fewer parameters.
activation=NoneThe activation parameter specifies the name of the activation function you want to apply after performing the convolution.To learn more about activation functions and their impact on your neural network, see our guide to neural network activation functions.
use_bias=TrueThe use_bias  parameter of the Conv2D class controls whether a bias vector is added to the convolutional layer.Typically you’ll want to leave this value as True, although some implementations of ResNet will leave the bias parameter out.
kernel_initializer='glorot_uniform'The initialization method used to initialize all values in the Conv2D class prior to training.The default is glorot_uniform, which is Xavier Glorot uniform initialization. This is suitable for most CNNs. For deeper networks, such as VGGnet, you may want to use  he_normal which uses the MSRA initialization method.
bias_initializer='zeros'Controls how the bias vector is initialized before training starts.You should typically leave this as the default, zeroes, meaning the bias will be initially filled by zeroes.
kernel_regularizer=NoneThese parameters control the type and amount of regularization. Regularization is a method which helps avoid overfitting and improve the ability of your model to generalize from training examples to a real population.For large datasets and deep networks, kernel regularization is a must. You can use either L1 or L2 regularization. If you detect signs of overfitting, consider using L2 regularization. Tune the amount of regularization, starting with values of 0.0001-0.001. For bias and activity, we recommend leaving at the default values for most scenarios.
kernel_constraint=NoneImpose constraints on the Conv2D layer, such as unit normalization, non-negativity, min-max normalization.These are advanced settings which should be left at defaults unless you have a special reason to use them in your model.


Running CNN at Scale on Keras with MissingLink

In this article, we explained how to create 2D convolutional layers in Keras. When you start working on Convolutional Neural Networks and running large numbers of experiments, you’ll run into some practical challenges:


tracking experiments

Tracking experiment progress and hyperparameters can be challenging when you run a large number of experiments. You will have to scale up your experiments to tune your CNN and try all relevant variations of network architecture and hyperparameters.

running experiment across multiple machines

Running experiments on multiple machines—CNNs can take a long time to run, especially with large datasets. You will want to run your CNNs on more machines and GPUs, either on-premise or in the cloud. It can be very time consuming to provision these machines, distribute experiments between them and monitor progress.

manage training datasets

Manage training data—computer vision projects with images, video or other rich media, training sets can have very large datasets. Copying the data to each training machine, replacing it for each new experiment and managing changes to datasets can be difficult. To scale up you must do this in an automated way.

MissingLink is a deep learning platform that can help you automate these operational aspects of CNN on Keras, so you can concentrate on building winning experiments. Learn more to see how easy it is.

Learn More About Deep Learning Frameworks