Keras Conv2D: Working with CNN 2D Convolutions in Keras
This article explains how to create 2D convolutional layers in Keras, as part of a Convolutional Neural Network (CNN) architecture.
A CNN architecture has three main parts:
- A convolutional layer that extracts features from a source image. This is the essential feature of a CNN, which works on parts of the image each time, instead of feeding all the input to each layer of the network.
- A pooling layer that downsamples each feature to reduce its dimensionality and focus on the most important elements. There are several rounds of convolution and pooling; and in some CNN architectures, there may be hundreds or thousands.
- A fully connected layer that flattens the features identified in the previous layers into a vector, and applies a traditional neural network with all neurons in each layer connected to all neurons in the next layer, to make a prediction about the image.
In Keras, you build a CNN architecture using the following process:
1. Reshape the input data into a format suitable for the convolutional layers, using X_train.reshape() and X_test.reshape()
2. For class-based classification, one-hot encode the categories using the to_categorical() function.
3. Build the model using the Sequential.add() function. For a 2D convolutional layer, the command looks like the following.
model.add(Conv2D(64, kernel_size=3, activation=’relu’, input_shape=(28,28,1)))
>> You are here. In this article, we explain how to work with 2D convolutional layers in Keras.
4. Add a pooling layer, for example using the Sequential.add(MaxPooling2D()) function – not showing all parameters.
5. Add a “flatten” layer which prepares a vector for the fully connected layers, using Sequential.add(Flatten()).
6. Add one or more fully connected layer using Sequential.add(Dense)). Typically you will follow each fully connected layer with a dropout layer (learn more about dropout in our guide to neural network hyperparameters ), using Sequential.add(Dropout)).
7. Compile the model using model.compile()
8. Train the model using model.fit(), supplying X_train() and X_test() which are the source images; y_train() and y_test() which are known classification results.
9. Use model.predict() to generate a prediction.
In this article you will learn:
- What is a 2D convolutional layer and its role in CNN image classification
- Keras CNN example showing the Conv2D function
- Keras Conv2D parameters – what they do and how to tune them
- Running CNN on Keras in the real world
Briefly, some background. A convolution layer “scans” A source image with a filter of, for example, 5×5 pixels, to extract features which may be important for classification. This filter is also called the convolution kernel. The kernel also contains weights, which are tuned in the training of the model to achieve the most accurate predictions.
In a 5×5 kernel, for each 5×5 pixel region, the model computes the dot products between the image pixel values and the weights defined in the filter.
A 2D convolution layer means that the input of the convolution operation is three-dimensional, for example, a color image which has a value for each pixel across three layers: red, blue and green. However, it is called a “2D convolution” because the movement of the filter across the image happens in two dimensions. The filter is run across the image three times, once for each of the three layers.
After the convolution ends, the features are downsampled, and then the same convolutional structure repeats again. At first, the convolution identifies features in the original image (for example in a cat, the body, legs, tail, head), then it identifies sub-features within smaller parts of the image (for example, within the head, the ears, whiskers, eyes). Eventually, this process is meant to identify the essential features that can help classify the image. Learn more in our guide to Convolutional Neural Networks.
Here is a simple code example to show you the context of Conv2D in a complete Keras model. The example was created by Andy Thomas. This model has two 2D convolutional layers, highlighted in the code.
# building the mode model = Sequential() model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(64, (5, 5), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(1000, activation='relu')) model.add(Dense(num_classes, activation='softmax')) # training model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=0.01), metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test), callbacks=[history]) # evaluating and printing results score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score) print('Test accuracy:', score)
When adding a Conv2D layer using Sequential.model.add(), there are numerous parameters you can use, as defined in the underlying keras.layers.conv2D() function (see documentation).
Here is the full signature of the Keras Conv2D function:
keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
Below we explain each of these parameters, what it does, and some best practices for setting and tuning it. To get more background about tuning neural networks, see our guide on neural network hyperparameters.
|Keras Conv2D Parameter||What it Does||Best Practices and Tuning|
|Sets the number of filters used in the convolution operation.||Earlier 2D convolutional layers, closer to the input, learn less filters, while later convolutional layers, closer to the output, learn more filters. The number of filters you select should depend on the complexity of your dataset and the depth of your neural network. A common setting to start with is [32, 64, 128] for three layers, and if there are more layers, increasing to [256, 512, 1024], etc.|
|Specifies the size of the convolutional filter in pixels. Must be an odd integer.||Filter size may be determined by the CNN architecture you are using – for example VGGNet exclusively uses (3, 3) filters. If not, use a 5×5 or 7×7 filter to learn larger features and then quickly reduce to 3×3. If your images are smaller than 128×128, consider working with smaller filters of 1×1 and 3×3.|
|The strides parameter is a 2-tuple of integers, specifying how the convolutional filter should “step” along the x and y-axis of the source image.||In most cases, it’s okay to leave the strides parameter with the default (1, 1). However, you may increase it to (2, 2) to reduce the size of the output volume.|
|The padding parameter has two values: ||The default Keras value is valid, but it is often effective to set it to same for most of the layers, then reduce spatial dimensions using max pooling or strided convolutions.|
|Specifies the order of data in the input received from the backend deep learning framework: ||The TensorFlow backend to Keras uses channels last ordering. Do not change this parameter unless you are using Theano as your backend.|
|A 2-tuple of integers, controlling the dilation rate for dilated convolution. Dilated convolution is a convolution applied to the input volume with defined gaps (the filter does not scan the entire image, skipping certain segments).||Dilated convolutions are useful for working with higher resolution images, but wanting to still focus on fine-grained details, or when constructing a network with fewer parameters.|
|The activation parameter specifies the name of the activation function you want to apply after performing the convolution.||To learn more about activation functions and their impact on your neural network, see our guide to neural network activation functions.|
|The use_bias parameter of the Conv2D class controls whether a bias vector is added to the convolutional layer.||Typically you’ll want to leave this value as True, although some implementations of ResNet will leave the bias parameter out.|
|The initialization method used to initialize all values in the Conv2D class prior to training.||The default is |
|Controls how the bias vector is initialized before training starts.||You should typically leave this as the default, zeroes, meaning the bias will be initially filled by zeroes.|
|These parameters control the type and amount of regularization. Regularization is a method which helps avoid overfitting and improve the ability of your model to generalize from training examples to a real population.||For large datasets and deep networks, kernel regularization is a must. You can use either L1 or L2 regularization. If you detect signs of overfitting, consider using L2 regularization. Tune the amount of regularization, starting with values of 0.0001-0.001. For bias and activity, we recommend leaving at the default values for most scenarios.|
|Impose constraints on the Conv2D layer, such as unit normalization, non-negativity, min-max normalization.||These are advanced settings which should be left at defaults unless you have a special reason to use them in your model.|
In this article, we explained how to create 2D convolutional layers in Keras. When you start working on Convolutional Neural Networks and running large numbers of experiments, you’ll run into some practical challenges:
Tracking experiment progress and hyperparameters can be challenging when you run a large number of experiments. You will have to scale up your experiments to tune your CNN and try all relevant variations of network architecture and hyperparameters.
Running experiments on multiple machines—CNNs can take a long time to run, especially with large datasets. You will want to run your CNNs on more machines and GPUs, either on-premise or in the cloud. It can be very time consuming to provision these machines, distribute experiments between them and monitor progress.
Manage training data—computer vision projects with images, video or other rich media, training sets can have very large datasets. Copying the data to each training machine, replacing it for each new experiment and managing changes to datasets can be difficult. To scale up you must do this in an automated way.