Important announcement: Missinglink has shut down. Click here to learn more.
Creating a CNN in Keras, TensorFlow and Plain Python
From Basic to Advanced
Detecting Objects Without the Wait
Forging Pathways to the Future
The Complete Guide
Origin, Characteristics, and Advantages
Today, Python is the most common language used to build and train neural networks, specifically convolutional neural networks. In this article, we’ll discover why Python is so popular, how all major deep learning frameworks support Python, including the powerful platforms TensorFlow, Keras, and PyTorch.
We’ll also go through two tutorials to help you create your own Convolutional Neural Networks in Python: 1. building a convolutional neural network in Keras, and 2. creating a CNN from scratch using NumPy. In the end, we’ll discuss convolutional neural networks in the real world.
Python is the language most commonly used today to build and train neural networks and in particular, convolutional neural networks.
Here are a few reasons for its popularity:
All major deep learning frameworks support Python. Of these, the most popular and powerful platforms are TensorFlow, Keras (which is typically used as a frontend wrapper for TensorFlow), and PyTorch.
Below is a quick description of each of the frameworks, and installation instructions to get you started.



* Remove “gpu” to install nonGPU version for beginners 


In this tutorial you will use Keras to build a CNN that can identify handwritten digits. We’ll use the MNIST dataset of 70,000 handwritten digits (from 09).
The tutorial steps below are summarized – for full details and code see the full tutorial by Eijaz Allibhai.
Load the training and testing MNIST images into the variables X_train and X_test, with y_train and y_test used to hold the matching digits. Keep in mind that the shape of every image in the MNIST dataset is 28 x 28 pixels.
from keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data()
Reshape the inputs (X_train and X_test) to a shape that can be an input for the CNN model. The Keras reshape function takes as arguments the number of images (60,000 for X_train and 10,000 for X_test), the shape of each image (28×28), and the number of color channels – 1 in this case because images are greyscale.
Then, onehotencode the target variable, mapping a variable to each target label – in our case, ‘0’, ‘1’, ‘2’, etc. because we are recognizing digits.
X_train = X_train.reshape(60000,28,28,1) X_test = X_test.reshape(10000,28,28,1) y_train = to_categorical(y_train) y_test = to_categorical(y_test) y_train[0]
Use the code below to build a CNN model, via the convenient Sequential object in Keras. The model will include:
from keras.models import Sequential from keras.layers import Dense, Conv2D, Flatten model = Sequential() model.add(Conv2D(64, kernel_size=3, activation=’relu’, input_shape=(28,28,1))) model.add(Conv2D(32, kernel_size=3, activation=’relu’)) model.add(Flatten()) model.add(Dense(10, activation=’softmax’))
Compile the model, providing three parameters:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Train the model using the keras fit() function, providing the training data, target data, and the number of epochs the experiment should run (the number of times training should be repeated on the data).
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3)
The predict() function returns an array with 10 numbers, these are the probabilities that an image contains each possible digit from 0 to 9. Run a prediction for the first four images in the test set, and display the first four values in y_test to compare to the actual results.
model.predict(X_test[:4]) y_test[:4]
You’ll see that the model was correct – it predicted 7, 2, 1 and 0 for the first four images, which are the correct values in y_test.
In this tutorial you’ll see how to build a CNN from scratch using the NumPy library. This is considered more difficult than using a deep learning framework, but will give you a much better understanding what is happening behind the scenes of the deep learning process.
The following tutorial steps are summarized – see the full tutorial and code by Ahmed Gad.
Use this code to read an input image and convert it to grayscale:
import skimage.data img = skimage.data.chelsea() img = skimage.color.rgb2gray(img)
Prepare a filter to convert the image into a shape that can be used by the first convolutional layer. Create a zero array of size of size (2=num_filters, 3=num_rows_filter, 3=num_columns_filter), and two filters of size 3×3, a 2D array because the input image is grayscale and has only 1 color channel.
l1_filter = numpy.zeros((2,3,3)) l1_filter[0, :, :] = numpy.array([[[1, 0, 1], [1, 0, 1], [1, 0, 1]]]) l1_filter[1, :, :] = numpy.array([[[1, 1, 1], [0, 0, 0], [1, 1, 1]]])
Convolve the image by passing the filters over it, using the conv() function.
1. l1_feature_map = conv(img, l1_filter)
Here is how the filter bank is implemented. It checks if the number of image channels matches the filter depth, if filter dimensions are equal and if the filter has an odd size.
def conv(img, conv_filter): if len(img.shape) > 2 or len(conv_filter.shape) > 3: if img.shape[1] != conv_filter.shape[1]: print("Error: Number of channels in both image and filter must match.") sys.exit() if conv_filter.shape[1] != conv_filter.shape[2]: print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.') sys.exit() if conv_filter.shape[1]%2==0: print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.') sys.exit()
Then an empty feature map is added, the image is convolved by the filter, and the results of all convolutions are summed in the single feature map. Conv_map is an array that holds the sum of all feature maps.
feature_maps = numpy.zeros((img.shape[0]conv_filter.shape[1]+1, img.shape[1]conv_filter.shape[1]+1, conv_filter.shape[0])) for filter_num in range(conv_filter.shape[0]): print("Filter ", filter_num + 1) curr_filter = conv_filter[filter_num, :] if len(curr_filter.shape) > 2: conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) for ch_num in range(1, curr_filter.shape[1]): conv_map = conv_map + conv_(img[:, :, ch_num], curr_filter[:, :, ch_num]) else: conv_map = conv_(img, curr_filter) feature_maps[:, :, filter_num] = conv_map return feature_maps
Here is how you apply a ReLu activation after the convolution operation:
l1_feature_map_relu = relu(l1_feature_map)
The relu function is implemented as follows. It loops through every element in the feature map and returns the value if larger than 0, otherwise 0.
def relu(feature_map): #Preparing the output of the ReLU activation function. relu_out = numpy.zeros(feature_map.shape) for map_num in range(feature_map.shape[1]): for r in numpy.arange(0,feature_map.shape[0]): for c in numpy.arange(0, feature_map.shape[1]): relu_out[r, c, map_num] = numpy.max(feature_map[r, c, map_num], 0)
You apply max pooling on the results of the first convolution as follows:
l1_feature_map_relu_pool = pooling(l1_feature_map_relu, 2, 2)
Pooling is implemented as follows. The pooling function we define accepts the output of the ReLU layer, pooling mask size, and stride. It loops through the input, channel by channel, and for each channel in the input, applies the max pooling operation. The pool_out function clips the region and returns the max number according to the stride and size used.
def pooling(feature_map, size=2, stride=2): pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]size+1)/stride), numpy.uint16((feature_map.shape[1]size+1)/stride), feature_map.shape[1])) for map_num in range(feature_map.shape[1]): r2 = 0 for r in numpy.arange(0,feature_map.shape[0]size1, stride): c2 = 0 for c in numpy.arange(0, feature_map.shape[1]size1, stride): pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size, c:c+size]) c2 = c2 + 1 r2 = r2 +1
Here is how to stack the remaining layers to build a full CNN model. We define a second and third convolution, with ReLu and pooling steps in between.
l2_filter = numpy.random.rand(3, 5, 5, l1_feature_map_relu_pool.shape[1]) l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter) l2_feature_map_relu = relu(l2_feature_map) l2_feature_map_relu_pool = pooling(l2_feature_map_relu, 2, 2) l3_filter = numpy.random.rand(1, 7, 7, l2_feature_map_relu_pool.shape[1]) l3_feature_map = conv(l2_feature_map_relu_pool, l3_filter) l3_feature_map_relu = relu(l3_feature_map) l3_feature_map_relu_pool = pooling(l3_feature_map_relu, 2, 2)
And that’s it! You just built a full CNN architecture from scratch in NumPy.
In this article we explained the basics of Python for deep learning and provided two tutorials to create your own Convolutional Neural Networks in Python. When you start working on CNN projects, processing and generating predictions for real images, audio and video, you’ll run into some practical challenges:
Tracking experiment progress, source code, and hyperparameters across multiple CNN experiments. CNNs can have many variations and hyperparameter tweaks, and testing each will require running multiple experiments and tracking their results.
Running experiments across multiple machines—CNNs are computationally intensive, and you will probably need to run on multiple machines or specialized GPU hardware. Provisioning these machines, configuring them and distributing the work among them can be difficult.
Manage training data—CNN projects often involve images or other rich media, and training sets can weight anywhere Gigabytes upwards. Copying data to training machines and recopying it for every new experiment is time consuming and error prone.
MissingLink is a deep learning platform that can help you automate these operational aspects of CNN, so you can concentrate on building winning experiments.
Learn more about MissingLink and see how easy it is.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster