This article explains how to create 2D convolutional layers in Keras, as part of a Convolutional Neural Network (CNN) architecture.
2D convolutional layers take a threedimensional input, typically an image with three color channels. They pass a filter, also called a convolution kernel, over the image, inspecting a small window of pixels at a time, for example 3×3 or 5×5 pixels in size, and moving the window until they have scanned the entire image. The convolution operation calculates the dot product of the pixel values in the current filter window with the weights defined in the filter.
In Keras, you create 2D convolutional layers using the keras.layers.Conv2D() function. Unlike in the TensorFlow Conv2D process, you don’t have to define variables or separately construct the activations and pooling, Keras does this automatically for you.
This code sample creates a 2D convolutional layer in Keras.
keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
To understand the parameters in detail, see Understanding and Tuning Conv2D Parameters below.
We will also show how to run CNN at scale across dozens of machines, both on and off the cloud, using the MissingLink deep learning platform.
Briefly, some background. A convolution layer “scans” A source image with a filter of, for example, 5×5 pixels, to extract features which may be important for classification. This filter is also called the convolution kernel. The kernel also contains weights, which are tuned in the training of the model to achieve the most accurate predictions.
In a 5×5 kernel, for each 5×5 pixel region, the model computes the dot products between the image pixel values and the weights defined in the filter.
A 2D convolution layer means that the input of the convolution operation is threedimensional, for example, a color image which has a value for each pixel across three layers: red, blue and green. However, it is called a “2D convolution” because the movement of the filter across the image happens in two dimensions. The filter is run across the image three times, once for each of the three layers.
After the convolution ends, the features are downsampled, and then the same convolutional structure repeats again. At first, the convolution identifies features in the original image (for example in a cat, the body, legs, tail, head), then it identifies subfeatures within smaller parts of the image (for example, within the head, the ears, whiskers, eyes). Eventually, this process is meant to identify the essential features that can help classify the image. Learn more in our guide to Convolutional Neural Networks (CNN).
To help you understand the Conv2D operation, here is a quick primer on how to build Convolutional Neural Networks in Keras.
A CNN architecture has three main parts:
In Keras, you build a CNN architecture using the following process:
1. Reshape the input data into a format suitable for the convolutional layers, using X_train.reshape() and X_test.reshape()
2. For classbased classification, onehot encode the categories using the to_categorical() function.
3. Build the model using the Sequential.add() function. For a 2D convolutional layer, the command looks like the following.
model.add(Conv2D(64, kernel_size=3, activation=’relu’, input_shape=(28,28,1)))
4. Add a pooling layer, for example using the Sequential.add(MaxPooling2D()) function – not showing all parameters.
5. Add a “flatten” layer which prepares a vector for the fully connected layers, using Sequential.add(Flatten()).
6. Add one or more fully connected layer using Sequential.add(Dense)). Typically you will follow each fully connected layer with a dropout layer (learn more about dropout in our guide to neural network hyperparameters ), using Sequential.add(Dropout)).
7. Compile the model using model.compile()
8. Train the model using model.fit(), supplying X_train() and X_test() which are the source images; y_train() and y_test() which are known classification results.
9. Use model.predict() to generate a prediction.
Here is a simple code example to show you the context of Conv2D in a complete Keras model. The example was created by Andy Thomas. This model has two 2D convolutional layers, highlighted in the code.
# building the mode model = Sequential() model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(64, (5, 5), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(1000, activation='relu')) model.add(Dense(num_classes, activation='softmax')) # training model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=0.01), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test), callbacks=[history]) # evaluating and printing results score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])
When adding a Conv2D layer using Sequential.model.add(), there are numerous parameters you can use, as defined in the underlying keras.layers.conv2D() function (see documentation).
Here is the full signature of the Keras Conv2D function:
keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
Below we explain each of these parameters, what it does, and some best practices for setting and tuning it. To get more background about tuning neural networks, see our guide on neural network hyperparameters.
Keras Conv2D Parameter  What it Does  Best Practices and Tuning 
filters  Sets the number of filters used in the convolution operation.  Earlier 2D convolutional layers, closer to the input, learn less filters, while later convolutional layers, closer to the output, learn more filters. The number of filters you select should depend on the complexity of your dataset and the depth of your neural network. A common setting to start with is [32, 64, 128] for three layers, and if there are more layers, increasing to [256, 512, 1024], etc. 
kernel_size  Specifies the size of the convolutional filter in pixels. Must be an odd integer.  Filter size may be determined by the CNN architecture you are using – for example VGGNet exclusively uses (3, 3) filters. If not, use a 5×5 or 7×7 filter to learn larger features and then quickly reduce to 3×3. If your images are smaller than 128×128, consider working with smaller filters of 1×1 and 3×3. 
strides=(1, 1)  The strides parameter is a 2tuple of integers, specifying how the convolutional filter should “step” along the x and yaxis of the source image.  In most cases, it’s okay to leave the strides parameter with the default (1, 1). However, you may increase it to (2, 2) to reduce the size of the output volume. 
padding='valid'  The padding parameter has two values: valid or same . Valid means the input is not zeropadded, so the output of the convolution will be smaller than the dimensions of the original image. Same means the input will be zeropadded, so the convolution output can be the same size as the input.  The default Keras value is valid, but it is often effective to set it to same for most of the layers, then reduce spatial dimensions using max pooling or strided convolutions. 
data_format=None  Specifies the order of data in the input received from the backend deep learning framework: channels_last or channels_first  The TensorFlow backend to Keras uses channels last ordering. Do not change this parameter unless you are using Theano as your backend.

dilation_rate=(1, 1)  A 2tuple of integers, controlling the dilation rate for dilated convolution. Dilated convolution is a convolution applied to the input volume with defined gaps (the filter does not scan the entire image, skipping certain segments).  Dilated convolutions are useful for working with higher resolution images, but wanting to still focus on finegrained details, or when constructing a network with fewer parameters. 
activation=None  The activation parameter specifies the name of the activation function you want to apply after performing the convolution.  To learn more about activation functions and their impact on your neural network, see our guide to neural network activation functions. 
use_bias=True  The use_bias parameter of the Conv2D class controls whether a bias vector is added to the convolutional layer.  Typically you’ll want to leave this value as True, although some implementations of ResNet will leave the bias parameter out. 
kernel_initializer='glorot_uniform'  The initialization method used to initialize all values in the Conv2D class prior to training.  The default is glorot_uniform , which is Xavier Glorot uniform initialization. This is suitable for most CNNs. For deeper networks, such as VGGnet, you may want to use he_normal which uses the MSRA initialization method. 
bias_initializer='zeros'  Controls how the bias vector is initialized before training starts.  You should typically leave this as the default, zeroes, meaning the bias will be initially filled by zeroes. 
kernel_regularizer=None  These parameters control the type and amount of regularization. Regularization is a method which helps avoid overfitting and improve the ability of your model to generalize from training examples to a real population.  For large datasets and deep networks, kernel regularization is a must. You can use either L1 or L2 regularization. If you detect signs of overfitting, consider using L2 regularization. Tune the amount of regularization, starting with values of 0.00010.001. For bias and activity, we recommend leaving at the default values for most scenarios. 
bias_regularizer=None  
activity_regularizer=None  
kernel_constraint=None  Impose constraints on the Conv2D layer, such as unit normalization, nonnegativity, minmax normalization.  These are advanced settings which should be left at defaults unless you have a special reason to use them in your model. 
bias_constraint=None 
In this article, we explained how to create 2D convolutional layers in Keras. When you start working on Convolutional Neural Networks and running large numbers of experiments, you’ll run into some practical challenges:
Tracking experiment progress and hyperparameters can be challenging when you run a large number of experiments. You will have to scale up your experiments to tune your CNN and try all relevant variations of network architecture and hyperparameters.
CNNs can take a long time to run, especially with large datasets. You will want to run your CNNs on more machines and GPUs, either onpremise or in the cloud. It can be very time consuming to provision these machines, distribute experiments between them and monitor progress.
Computer vision projects with images, video or other rich media, training sets can have very large datasets. Copying the data to each training machine, replacing it for each new experiment and managing changes to datasets can be difficult. To scale up you must do this in an automated way.
MissingLink is a deep learning platform that does all of this for you, and lets you concentrate on building the most accurate model. Learn more to see how easy it is.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster