Keras ResNet: Building, Training & Scaling Residual Nets on Keras
ResNet took the deep learning world by storm in 2015, as the first neural network that could train hundreds or thousands of layers without succumbing to the “vanishing gradient” problem. Keras makes it easy to build ResNet models: you can run built-in ResNet variants pre-trained on ImageNet with just one line of code, or build your own custom ResNet implementation.
In this article you will learn:
- What is ResNet
- ResNet variants built into Keras: V2, V2, and ResNeXt
- Why it’s difficult to run ResNet yourself – a very big network
- Options for running ResNet on Keras: built-in Applications or do-it-yourself
- Scaling ResNet on Keras with MissingLink
Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture which was designed to enable hundreds or thousands of convolutional layers. While previous CNN architectures had a drop off in the effectiveness of additional layers, ResNet can add a large number of layers with strong performance.
ResNet was an innovative solution to the “vanishing gradient” problem. Neural networks train via the backpropagation process (see our guide on backpropagation ), which relies on gradient descent, moving down the loss function to find the weights that minimize it. If there are too many layers, repeated multiplication makes the gradient smaller and smaller, until it “disappears”, causing performance to saturate or even degrade with each additional layer.
The ResNet solution is “identity shortcut connections”. ResNet stacks up identity mappings, layers that initially don’t do anything, and skips over them, reusing the activations from previous layers. Skipping initially compresses the network into only a few layers, which enables faster learning. Then, when the network trains again, all layers are expanded and the “residual” parts of the network explore more and more of the feature space of the source image.
The creators of ResNet demonstrated they can train a ResNet with hundreds or thousands of layers that outperforms shallower networks, and ResNet has become one of the most popular architectures for computer vision tasks.
ResNet has inspired several similar architectures, two of which come built into Keras:
The primary difference between ResNetV2 and the original (V1) is that V2 uses batch normalization before each weight layer.
Uses a different identity mappings building block, which has several different paths of stacked identity layers, with their outputs merged via addition. ResNeXt introduces a new hyperparameter called “cardinality”, which defines how many paths exist in each block.
ResNet can have between dozens to thousands of convolutional layers and can take a long time to train and execute – from hours to several weeks in extreme cases. You will need to distribute a ResNet model across multiple GPUs, and if performance is insufficient, scale out to multiple machines.
However, you’ll find that running a deep learning model on multiple machines is difficult:
- On-premises, you need to set up multiple machines for deep learning, manually run experiments and carefully watch resource utilization
- In the cloud, you can spin up machines quickly, but need to build and test machine images, and manually run experiments on each machine. You’ll need to “babysit” your machines to ensure an experiment is always running, and avoid wasting money with expensive GPU machines.
MissingLink solves all that. It’s a deep learning platform that lets you scale out ResNet and other computer vision models automatically across numerous machines.
Just set up jobs in the MissingLink dashboard, define your cluster of on-premise or cloud machines, and the jobs will automatically run on your cluster of machines. You can train a ResNet model in minutes – not hours or days.
To avoid idle time, MissingLink immediately runs another experiment when the previous one ends, and cleanly shuts down cloud machines when all jobs complete.
Get a free MissingLink account and see how easy it is!
Built-In Keras ResNet Implementation: Applications Packages
Keras provides the Applications modules, which include multiple deep learning models, pre-trained on the industry standard ImageNet dataset and ready to use.
ImageNet training is extremely valuable because training ResNet on the huge ImageNet dataset is a formidable task, which Keras has done for you and packaged into its application modules. You can thus leverage transfer learning to apply this trained model to your own problems.
Keras Applications include the following ResNet implementations. Keras provides ResNet V1 and ResNet V2 with 50, 101, or 152 layers, and ResNeXt with 50 or 101 layers.
keras.applications.resnet.ResNet50 keras.applications.resnet.ResNet101 keras.applications.resnet.ResNet152 keras.applications.resnet_v2.ResNet50V2 keras.applications.resnet_v2.ResNet101V2 keras.applications.resnet_v2.ResNet152V2 keras.applications.resnext.ResNeXt50 keras.applications.resnext.ResNeXt101
Each of these is a function that takes the following arguments, allowing you to configure your ResNet model:
|Parameter||Data Type||What it Does|
|Boolean||Whether to include a fully-connected layer at the output end of the architecture. If true, the input shape must be (224, 224, 3).|
|String: None, ‘imagenet’||Whether to train with randomized weights or weights trained on the ImageNet dataset.|
|Tensor||Optional – a tensor to use as image input for the model.|
|Tuple||Optional – a shape tuple, which you need to specify if |
|String: None, ‘avg’, ‘max’||Optional – specifies pooling mode for feature extraction if |
|Integer||Optional – number of classes to classify images into, only to be specified if |
Coding a ResNet Architecture Yourself in Keras
What if you want to create a different ResNet architecture than the ones built into Keras? For example, you might want to use more layers or a different variant of ResNet. Priya Dwivedi created an extensive tutorial that shows, step by step, how to implement all the building blocks of ResNet in Keras, so you can build your own architectures from scratch.
For example, here is Dwivedi’s Keras code that builds the identity block:
# defining name basis conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' # Retrieve Filters F1, F2, F3 = filters # Save the input value. You'll need this later to add back to the main path. X_shortcut = X # First component of main path X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X) X = Activation('relu')(X) # Second component of main path (≈3 lines) X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X) X = Activation('relu')(X) # Third component of main path (≈2 lines) X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X) X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X) # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines) X = Add()([X, X_shortcut]) X = Activation('relu')(X) return X
Here is an illustration of the identity block created by the above code. This is the shortcut path:
See the full tutorial to see how to create all ResNet components yourself in Keras.
In this article, we learned the basics of ResNet and saw two ways to run ResNet on Keras: Using a pre-trained model in the Keras Applications modules, or by building ResNet components yourself by directly creating their layers in Keras.
As we mentioned above, training ResNet, especially with larger numbers of layers, is extremely computationally intensive. Don’t wait hours or days for ResNet to train! Use the MissingLink deep learning framework to:
- Scale out ResNet automatically across numerous machines, either on-premise or in the cloud
- Define a cluster of machines and automatically run deep learning jobs, with optimal resource utilization
- Avoid idle time by immediately running experiments one after the other, and shutting down cloud machines cleanly when jobs complete.
- MissingLink can also help you manage large numbers of experiments, track and share results, and manage large datasets and sync them easily to training machines.
Get a free MissingLink account and see how easy it is!