- Convolutional Neural Network Tutorial
From Basic to Advanced

- Faster R-CNN
Detecting Objects Without the Wait

- Python Convolutional Neural Network
Creating a CNN in Keras, TensorFlow and Plain Python

- Convolutional Neural Network Architecture
Forging Pathways to the Future

- Convolutional Neural Networks for Image Classification
- Understanding a 3D CNN and Its Uses
- Using Convolutional Neural Networks for Sentence Classification
- Graph Convolutional Networks
- Generative Adversarial Networks
- Fully Connected Layers in Convolutional Neural Networks
The Complete Guide

- CapsNet
Origin, Characteristics, and Advantages

The convolutional neural network architecture is central to deep learning, and it is what makes possible a range of applications for computer vision, from analyzing security footage and medical imaging to enabling the automation of vehicles and machines for industry and agriculture.

This article provides a basic description of the CNN architecture and its uses. We also provide two brief tutorials to help you build and train a CNN using Keras and TensorFlow, respectively.

A Convolutional Neural Network (CNN) is the foundation of most computer vision technologies. Unlike traditional multilayer perceptron architectures, it uses two operations called ‘convolution’ and pooling’ to reduce an image into its essential features, and uses those features to understand and classify the image.

The basic building blocks of CNN are:

**Convolution layer**━a “filter”, sometimes called a “kernel”, is passed over the image, viewing a few pixels at a time (for example, 3X3 or 5X5). The convolution operation is a dot product of the original pixel values with weights defined in the filter. The results are summed up into one number that represents all the pixels the filter observed.**Activation layer**━the convolution layer generates a matrix that is much smaller in size than the original image. This matrix is run through an activation layer, which introduces non-linearity to allow the network to train itself via backpropagation. The activation function is typically ReLu.**Pooling layer**━“pooling” is the process of further downsampling and reducing the size of the matrix. A filter is passed over the results of the previous layer and selects one number out of each group of values (typically the maximum, this is called max pooling). This allows the network to train much faster, focusing on the most important information in each feature of the image.**Fully connected layer**━a traditional multilayer perceptron structure. Its input is a one-dimensional vector representing the output of the previous layers. Its output is a list of probabilities for different possible labels attached to the image (e.g. dog, cat, bird). The label that receives the highest probability is the classification decision.

There may be multiple activation and pooling layers, depending on the CNN architecture.

People are doing cool things with CNN. Here are some common applications of computer vision powered by Convolutional Neural Networks:

**Agriculture**━farmers use hyperspectral or multispectral sensors to take pictures of crops, and analyze the images with computer vision to determine their health, or the viability of seeds to be sown.**Self-driving cars**━CNNs are used for object detection and classification, performed in real time against live video footage from car cameras. Today’s self-driving cars are able to identify other vehicles, people and obstacles and navigate around them with surprising accuracy.**Surveillance**━modern security systems with computer vision capabilities can identify crime, violence or theft in video footage in real time and alert security personnel. Again this leverages CNN-based object detection and classification in video frames.**Healthcare**━computer vision in healthcare (click to read our extensive guide on the subject) helps diagnose diseases like pneumonia, diabetes and breast cancer. In many cases CNN-based analysis and diagnosis of medical images can be as accurate or even more accurate than a human technician or physician.

In this tutorial we show how to build a simple CNN using Keras, with a TensorFlow backend. The network can process the standard MNIST dataset, containing images of handwritten digits, and predict which digit each image represents.

The steps below are summarized━for the full tutorial see __Adventures in Machine Learning__.

**Defining the model**

We’ll use the Sequential() function which is probably the easiest way to define a deep learning model in Keras. It lets you add layers on one by one.

model = Sequential()

Use the Keras Conv2D function to create a 2-dimensional convolutional layer, with kernel size (filter) of 5X5 pixels and a stride of 1 in x and y directions. The Conv2D command automatically creates the activation function for you━here we use ReLu activation.

model.add(Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=input_shape))

Then use the MaxPooling2D function to add a 2D max pooling layer, with pooling filter sized 2X2 and stride of 2 in x and y directions.

model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

Add one more convolution and pooling layers━this time the convolution has 64 filters:

model.add(Conv2D(64, (5, 5), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2)))

Finally, flatten the output and define the fully connected layers that generate probabilities for the ten prediction labels (0.9, the possible values of every written digit):

model.add(Flatten()) model.add(Dense(1000, activation='relu')) model.add(Dense(num_classes, activation='softmax'))

**Compile and training the CNN**

Compile the network using the model.compile() command. Select cross entropy loss function, Adam optimizer with learning rate 0.01, and accuracy as your metric to evaluate performance.

model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.SGD(lr=0.01), metrics=['accuracy'])

Now train it using model.fit(), passing the training and testing dataset, and specifying your batch size and number of epochs for training.

model.fit(x_train, y_train, batch_size=128, epochs=10, verbose=1, validation_data=(x_test, y_test), callbacks=[history])

**Evaluating performance**

The results look like this:

3328/60000 [>.............................] - ETA: 87s - loss: 0.2180 - acc: 0.9336 3456/60000 [>.............................] - ETA: 87s - loss: 0.2158 - acc: 0.9349 ...

Use the evaluate() function to evaluate the performance of the model, using accuracy as we defined previously:

score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1])

This tutorial does the same thing as the previous one, processing MNIST dataset images and predicting which digit each represents. However, unlike the previous tutorial which used easy Keras command to run the network, here we create the network from its primitives directly in TensorFlow. This can help you understand how a CNN works behind the scenes.

The steps below are summarized━see the full code in this Github repo by __Adventures in Machine Learning__.

**Input data**

Input the MNIST dataset into TensorFlow and set basic parameters━learning rate, epochs and batch size (see our guide on hyperparameters to learn more about these):

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) learning_rate = 0.0001 epochs = 10 batch_size = 50

Define a placeholder for training data. MNIST images are 28X28 pixels, resulting in a 784-pixel flattened input. Reshape the input into a format appropriate to the CNN, and define a placeholder for the out━ten possible labels from 0 to 9.

x = tf.placeholder(tf.float32, [None, 784]) x_shaped = tf.reshape(x, [-1, 28, 28, 1]) y = tf.placeholder(tf.float32, [None, 10])

**Convolution layer**

Create a function that defines a convolutional layer. In the function, we setup the input shape of the data, initialize weights and bias, create the convolutional layer using the tf.nn.conv2d function, and apply a ReLu activation function.

def create_new_conv_layer(input_data, num_input_channels, num_filters, filter_shape, pool_shape, name): conv_filt_shape = [filter_shape[0], filter_shape[1], num_input_channels, num_filters] weights = tf.Variable(tf.truncated_normal(conv_filt_shape, stddev=0.03), name=name+'_W') bias = tf.Variable(tf.truncated_normal([num_filters]), name=name+'_b') out_layer = tf.nn.conv2d(input_data, weights, [1, 1, 1, 1], padding='SAME') out_layer += bias out_layer = tf.nn.relu(out_layer) ksize = [1, pool_shape[0], pool_shape[1], 1] strides = [1, 2, 2, 1] out_layer = tf.nn.max_pool(out_layer, ksize=ksize, strides=strides, padding='SAME') return out_layer

Now we’ll use this function to create two convolutional layers, the first with 32 filters and the second with 64 filters.

layer1 = create_new_conv_layer(x_shaped, 1, 32, [5, 5], [2, 2], name='layer1') layer2 = create_new_conv_layer(layer1, 32, 64, [5, 5], [2, 2], name='layer2')

**Fully connected layers**

Flatten the output of the convolutional layers, as follows:

flattened = tf.reshape(layer2, [-1, 7 * 7 * 64])

Setup weights and biases, and create two densely connected layers, with softmax activation, which is appropriate for an output layer that generates probabilities for predictive labels. We’ll use a cross-entropy loss function, built into TensorFlow.

wd1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1000], stddev=0.03), name='wd1') bd1 = tf.Variable(tf.truncated_normal([1000], stddev=0.01), name='bd1') dense_layer1 = tf.matmul(flattened, wd1) + bd1 dense_layer1 = tf.nn.relu(dense_layer1) wd2 = tf.Variable(tf.truncated_normal([1000, 10], stddev=0.03), name='wd2') bd2 = tf.Variable(tf.truncated_normal([10], stddev=0.01), name='bd2') dense_layer2 = tf.matmul(dense_layer1, wd2) + bd2 y_ = tf.nn.softmax(dense_layer2) cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=dense_layer2, labels=y))

**Train the network**

Define a parameter that can assess the accuracy of the network, initialize variables and train the network.

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) init_op = tf.global_variables_initializer() with tf.Session() as sess: # initialise the variables sess.run(init_op) total_batch = int(len(mnist.train.labels) / batch_size) for epoch in range(epochs): avg_cost = 0 for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size) _, c = sess.run([optimiser, cross_entropy], feed_dict={x: batch_x, y: batch_y}) avg_cost += c / total_batch test_acc = sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}) print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost), " test accuracy: {:.3f}".format(test_acc))

Running deep learning experiments is a complicated matter. You need to select a framework, such as Keras and TensorFlow, train your CNN and track its progress. All this has to be supported by a powerful computing infrastructure, and if not properly optimized, your deep learning model could incur inflated storage and hardware costs, not to mention the inefficiency in terms of time. Running multiple experiments is even more demanding.

In this article, we’ve seen how you can configure and train your convolutional neural network using images from the MNIST dataset, providing two examples━with Keras and TensorFlow. However, we’ve just scratched the surface of CNN and what it can do. To help you set up and manage your deep learning project, you will need the support of an experienced platform to help you optimize your performance and costs.

With the MissingLink __deep learning platform__, you can build your CNN more easily and automate the operational elements for computer vision tasks. MissingLink can help you run and manage your projects so you can focus your efforts on building advanced convolutional neural networks.

The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.

- Intelligently version massive datasets.
- Reproduce experiments with one click.
- Scale your compute resources.
- Stream your data, cache it locally

and only syncs changes.

Thank you!

We will be in touch with more information in one business day.

In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.

The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.

Thank you!

We will be in touch with more information in one business day.

In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.

- Run experiments across hundreds of machines
- Easily collaborate with your team on experiments
- Reproduce experiments with one click
- Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

Thank you!

We will be in touch with more information in one business day.

In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.