Important announcement: Missinglink has shut down. Click here to learn more.
A Practical Guide
Two Quick Tutorials
Introduction and Hands-On Tutorial
Building, Training and Scaling Residual Networks on TensorFlow
Working with CNN Max Pooling Layers in TensorFlow
Two Quick Tutorials
Tutorials
Three Quick Tutorials
Three Quick Tutorials
Introduction and Tutorials
Key Approaches and Tutorials
Introduction and Examples
Three Examples
A Convolutional Neural Network (CNN) has three important building blocks:
● A convolutional layer that extracts features from the image or parts of an image
● A subsampling or pooling layer that reduces the dimensionality of each feature to focus on the most important elements (typically there are several rounds of convolution and pooling)
● A fully connected layer that takes a flattened form of the features identified in the previous layers, and uses them to make a prediction about the image.
In TensorFlow, you build a CNN architecture using the following process:
1. Reshape input if necessary using tf.reshape() to match the convolutional layer you intend to build (for example, if using a 2D convolution, reshape it into three-dimensional format)
2. Create a convolutional layer using tf.nn.conv1d(), tf.nn.conv2d(), or tf.nn.conv3d, depending on the dimensionality of the input. >> You are here – in this article we explain tf.nn.conv2d() in more detail
3. Create a poling layer using tf.nn.maxpool()
4. Repeat steps 2 and 3 for additional convolution and pooling layers
5. Reshape output of convolution and pooling layers, flattening it to prepare for the fully connected layer
6. Create a fully connected layer using tf.matmul() function, add an activation using, for example, tf.nn.relu() (see all TensorFlow activations, or learn more in our guide to neural network activation functions), and apply a dropout using tf.nn.dropout() (learn more about dropout in our guide to neural network hyperparameters)
7. Create a final layer for class prediction, again using tf.matmul() 8. Store weights and biases using TensorFlow variables These are just the basic steps to create the CNN model, there are additional steps to define training and evaluation, execute the model and tune it – see our full guide to TensorFlow CNN.
To speed up the process, you can use MissingLink’s deep learning platform. MissingLink automates experiments, resources and data management in deep learning framework like TensorFlow.
A convolution layer extracts features from a source image by “scanning” the image with a filter of, for example, 5×5 pixels. For each 5×5 pixel region within the image, the convolution operation computes the dot products between the values of the image pixels and the weights defined in the filter.
A 2D convolution layer means that the input of the convolution operation is three-dimensional. This is a bit confusing, as you’d expect the input to be two-dimensional. But the “2D” in “2D convolution” refers to the movement of the filter, which traverses the image in two dimensions. For example, a color image which has a value for each pixel across three layers: red, blue and green. The filter is then run across the image three times, once for each layer.
The same convolutional structure is used successively, at first to identify features in the original image, and then to identify sub-features within smaller parts of the image, after downsampling or “pooling” the result of previous convolutions. Eventually, this process is meant to identify the essential features that can help classify the image. Learn more in our guide to Convolutional Neural Networks.
tf.nn.conv2d() is the TensorFlow function you can use to build a 2D convolutional layer as part of your CNN architecture. tt.nn.conv2d() is a low-level API which gives you full control over how the convolution is structured. To learn about a simpler functional interface called tf.layers.conv2d(), which abstracts these steps, see the following section. We’ll illustrate how the tf.nn.conv2d() function works using the TensorFlow conv2d example by Aymeric Damien which generates predictions for MNIST handwritten digits. The TensorFlow conv2d() related code is highlighted in yellow, in full context of the TensorFlow CNN model (omitting the code for executing model training).
# Import MNIST data from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/", one_hot=True) # Training Parameters learning_rate = 0.001 num_steps = 200 batch_size = 128 display_step = 10 # Network Parameters num_input = 784 # MNIST data input (img shape: 28*28) num_classes = 10 # MNIST total classes (0-9 digits) dropout = 0.75 # Dropout, probability to keep units # tf Graph input X = tf.placeholder(tf.float32, [None, num_input]) Y = tf.placeholder(tf.float32, [None, num_classes]) keep_prob = tf.placeholder(tf.float32) # dropout (keep probability) # Create some wrappers for simplicity def conv2d(x, W, b, strides=1): # Conv2D wrapper, with bias and relu activation x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') x = tf.nn.bias_add(x, b) return tf.nn.relu(x) def maxpool2d(x, k=2): # MaxPool2D wrapper return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME') # Create model def conv_net(x, weights, biases, dropout): # MNIST data input is a 1-D vector of 784 features (28*28 pixels) # Reshape to match picture format [Height x Width x Channel] # Tensor input become 4-D: [Batch Size, Height, Width, Channel] x = tf.reshape(x, shape=[-1, 28, 28, 1]) # Convolution Layer conv1 = conv2d(x, weights['wc1'], biases['bc1']) # Max Pooling (down-sampling) conv1 = maxpool2d(conv1, k=2) # Convolution Layer conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) # Max Pooling (down-sampling) conv2 = maxpool2d(conv2, k=2) # Fully connected layer # Reshape conv2 output to fit fully connected layer input fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]]) fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1']) fc1 = tf.nn.relu(fc1) # Apply Dropout fc1 = tf.nn.dropout(fc1, dropout) # Output, class prediction out = tf.add(tf.matmul(fc1, weights['out']), biases['out']) return out # Store layers weight & bias weights = { # 5x5 conv, 1 input, 32 outputs 'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])), # 5x5 conv, 32 inputs, 64 outputs 'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])), # fully connected, 7*7*64 inputs, 1024 outputs 'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])), # 1024 inputs, 10 outputs (class prediction) 'out': tf.Variable(tf.random_normal([1024, num_classes])) } biases = { 'bc1': tf.Variable(tf.random_normal([32])), 'bc2': tf.Variable(tf.random_normal([64])), 'bd1': tf.Variable(tf.random_normal([1024])), 'out': tf.Variable(tf.random_normal([num_classes])) } # Construct model logits = conv_net(X, weights, biases, keep_prob) prediction = tf.nn.softmax(logits) # Define loss and optimizer loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) train_op = optimizer.minimize(loss_op) # Evaluate model correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1)) accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
In the implementation above, a utility is defined which creates a 2D convolutional layer and then adds biases and applies a ReLu activation layer.
Let’s review the arguments of the Tensorflow conv2d() function: x is the input – pixel values from the image. W are the weights defined in the filter. The weights are defined as a four-dimensional tensor: [filter_height, filter_width, input_depth, output_depth].
input_depth
represents the number of layers in the image, for example three layers for RGB.
output_depth
represents the number of filters that should be applied to the image. Each filter is run through all the input layers, using a filter size defined by filter_height
and filter_width
, multiplies each input pixel by a weight, and sums up the results.
stride is the speed by which the filter moves across the image, or the number of pixels it shifts every time. The stride is defined as a 4D tensor, because the input has four dimensions: [number_of_samples, height, width, colour_channels]
. Setting the strides tensor to [1, strides, strides, 1]
applies the filter to every image, every color channel, and every image patch in the height and width dimensions. 1 at the beginning and end specifies you won’t skip an image or an entire color channel. For example, strides=[1, 2, 2, 1]
applies the filter to half the image patches in each dimension, and skips half.
“SAME” padding specifies that the output size should be the same as the input size. In order to achieve this, there is a one-pixel-width padding around the image, and the filter slides outside the image into this padding area. Alternatively, you can use “VALID”
padding in which the filter stays inside the pixel areas of the image, resulting in an output size smaller than the input.
tf.layers.conv2d() creates a convolution filter that produces a tensor of outputs, and takes care of all aspects of the convolutional layer, including bias and activation. Unlike when you use the low-level tf.nn.conv2d() function, which only performs the convolution operation and requires that you define bias and activation separately.
Here are some of the important arguments of the tf.layers.conv2d() abstraction:
● Inputs – a Tensor input, representing image pixels which should have been reshaped into a 2D format
● filters – the number of filters in the convolution (dimensionality of the output space).
● kernel_size – the filter size, an integer or tuple of 2 integers, specifying the height and width of the convolution window. Set a single integer to use a filter with identical height and width.
● strides – an integer or tuple of 2 integers, specifying how the filter should move along the height and width. Set a single integer to use the same stride value for both dimensions.
● padding: "VALID"
, meaning the image is padded with a border of one pixel, or "SAME"
, meaning the filter moves within the image with no padding, generating a smaller output.
● data_format – specifies ordering of dimensions in the inputs, can be either channels_last
(default, inputs with shape [batch, height, width, channels]) or channels_first
(inputs with shape [batch, channels, height, width].
● dilation_rate – enables advanced convolutional structures with dilated (expanded) convolutions. Use an integer or tuple of 2 integers to specify the dilation rate.
● activation – the activation function you’d like to use. Set to None for linear activation.
● use_bias – boolean, specifies whether a bias should be added to the layer. For all parameters see TensorFlow documentation.
In this article, we explained how to create a 2D convolutional layer in TensorFlow. When you start working on CNN projects and running large numbers of experiments, you’ll run into some practical challenges:
Tracking experiment progress and hyperparameters across multiple experiments—CNNs can have a large number of possible variations which may impact your results. To test each of these, you will need to run and tracking numerous experiments.
Running multiple CNN experiments, especially with large datasets, will require multiple machines or GPUs. Provisioning machines, distributing experiments between them and monitoring progress can become a burden.
If you work on CNN projects with images, video or other rich media, training sets can get very large. Copying this data to each training machine and replacing it for different experiments can be time-consuming. An automated way is needed to manage the data and copy it efficiently to deep learning machines.
MissingLink is a deep learning platform that does all of this for you and lets you concentrate on building the most accurate model. Learn more to see how easy it is.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster