Deep Learning Frameworks

Keras ResNet: Building, Training & Scaling Residual Nets on Keras

ResNet took the deep learning world by storm in 2015, as the first neural network that could train hundreds or thousands of layers without succumbing to the “vanishing gradient” problem. Keras makes it easy to build ResNet models: you can run built-in ResNet variants pre-trained on ImageNet with just one line of code, or build your own custom ResNet implementation.

 

In this article you will learn:

  • What is ResNet
  • ResNet variants built into Keras: V2, V2, and ResNeXt
  • Why it’s difficult to run ResNet yourself – a very big network
  • Options for running ResNet on Keras: built-in Applications or do-it-yourself
  • Scaling ResNet on Keras with MissingLink

What is a ResNet Neural Network?

Residual Network (ResNet) is a Convolutional Neural Network (CNN)  architecture which was designed to enable hundreds or thousands of convolutional layers. While previous CNN architectures had a drop off in the effectiveness of additional layers, ResNet can add a large number of layers with strong performance.

 

ResNet was an innovative solution to the “vanishing gradient” problem. Neural networks train via the backpropagation process (see our guide on backpropagation ), which relies on gradient descent, moving down the loss function to find the weights that minimize it. If there are too many layers, repeated multiplication makes the gradient smaller and smaller, until it “disappears”, causing performance to saturate or even degrade with each additional layer.

 

The ResNet solution is “identity shortcut connections”. ResNet stacks up identity mappings, layers that initially don’t do anything, and skips over them, reusing the activations from previous layers. Skipping initially compresses the network into only a few layers, which enables faster learning. Then, when the network trains again, all layers are expanded and the “residual” parts of the network explore more and more of the feature space of the source image.

 

vgg-19 resnet-152

Source: ResearchGate

 

The creators of ResNet demonstrated they can train a ResNet with hundreds or thousands of layers that outperforms shallower networks, and ResNet has become one of the most popular architectures for computer vision tasks.

 

ResNet Variations You Can Use on Keras

ResNet has inspired several similar architectures, two of which come built into Keras:

 

ResNetV2

The primary difference between ResNetV2 and the original (V1) is that V2 uses batch normalization before each weight layer.

ResNeXt

Uses a different identity mappings building block, which has several different paths of stacked identity layers, with their outputs merged via addition. ResNeXt introduces a new hyperparameter called “cardinality”, which defines how many paths exist in each block.

 

ResNeXt

 


Why it’s Difficult to Run ResNet Yourself and How MissingLink Can Help

ResNet can have between dozens to thousands of convolutional layers and can take a long time to train and execute – from hours to several weeks in extreme cases. You will need to distribute a ResNet model across multiple GPUs, and if performance is insufficient, scale out to multiple machines.

 

However, you’ll find that running a deep learning model on multiple machines is difficult:

  • On-premises, you need to set up multiple machines for deep learning, manually run experiments and carefully watch resource utilization
  • In the cloud, you can spin up machines quickly, but need to build and test machine images, and manually run experiments on each machine. You’ll need to “babysit” your machines to ensure an experiment is always running, and avoid wasting money with expensive GPU machines.

 

MissingLink solves all that. It’s a deep learning platform that lets you scale out ResNet and other computer vision models automatically across numerous machines.

 

ResNet deep learning platform

 

Just set up jobs in the MissingLink dashboard, define your cluster of on-premise or cloud machines, and the jobs will automatically run on your cluster of machines. You can train a ResNet model in minutes – not hours or days.

 

To avoid idle time, MissingLink immediately runs another experiment when the previous one ends, and cleanly shuts down cloud machines when all jobs complete.

 

Get a free MissingLink account and see how easy it is!


Options for Running ResNet on Keras

Built-In Keras ResNet Implementation: Applications Packages

Keras provides the Applications modules, which include multiple deep learning models, pre-trained on the industry standard ImageNet dataset and ready to use.

 

ImageNet training is extremely valuable because training ResNet on the huge ImageNet dataset is a formidable task, which Keras has done for you and packaged into its application modules.  You can thus leverage transfer learning to apply this trained model to your own problems.

 

Keras Applications include the following ResNet implementations. Keras provides ResNet V1 and ResNet V2 with 50, 101, or 152 layers, and ResNeXt with 50 or 101 layers.

 

keras.applications.resnet.ResNet50
keras.applications.resnet.ResNet101
keras.applications.resnet.ResNet152
keras.applications.resnet_v2.ResNet50V2
keras.applications.resnet_v2.ResNet101V2
keras.applications.resnet_v2.ResNet152V2
keras.applications.resnext.ResNeXt50
keras.applications.resnext.ResNeXt101

 

Each of these is a function that takes the following arguments, allowing you to configure your ResNet model:

 

ParameterData TypeWhat it Does
include_topBooleanWhether to include a fully-connected layer at the output end of the architecture. If true, the input shape must be (224, 224, 3).
weightsString: None, ‘imagenet’Whether to train with randomized weights or weights trained on the ImageNet dataset.
input_tensorTensorOptional – a tensor to use as image input for the model.
input_shapeTupleOptional – a shape tuple, which you need to specify if include_top is false. Must have exactly 3 inputs channels and width/height up to 32.
poolingString: None, ‘avg’, ‘max’Optional – specifies pooling mode for feature extraction if include_top is false. None means the network will output the 4D tensor output of the last convolutional layer. avg uses global average pooling for the last layer, meaning it outputs a 2D tensor. max uses max pooling.
classesIntegerOptional – number of classes to classify images into, only to be specified if include_top is  True, and if no weights argument is specified.

Coding a ResNet Architecture Yourself in Keras

What if you want to create a different ResNet architecture than the ones built into Keras? For example, you might want to use more layers or a different variant of ResNet. Priya Dwivedi created an extensive tutorial that shows, step by step, how to implement all the building blocks of ResNet in Keras, so you can build your own architectures from scratch.

 

For example, here is Dwivedi’s Keras code that builds the identity block:

 

# defining name basis
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    # Retrieve Filters
    F1, F2, F3 = filters
    
    # Save the input value. You'll need this later to add back to the main path. 
    X_shortcut = X
    
    # First component of main path
    X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
    X = Activation('relu')(X)

    
    # Second component of main path (≈3 lines)
    X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
    X = Activation('relu')(X)

    # Third component of main path (≈2 lines)
    X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)

    # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
    X = Add()([X, X_shortcut])
    X = Activation('relu')(X)
        
    return X

 

Here is an illustration of the identity block created by the above code. This is the shortcut path:

 

keras resnet identity block

Source: Github

 

See the full tutorial to see how to create all ResNet components yourself in Keras.

Scaling ResNet on Keras

In this article, we learned the basics of ResNet and saw two ways to run ResNet on Keras: Using a pre-trained model in the Keras Applications modules, or by building ResNet components yourself by directly creating their layers in Keras.

 

As we mentioned above, training ResNet, especially with larger numbers of layers, is extremely computationally intensive. Don’t wait hours or days for ResNet to train! Use the MissingLink deep learning framework to:

  • Scale out ResNet automatically across numerous machines, either on-premise or in the cloud
  • Define a cluster of machines and automatically run deep learning jobs, with optimal resource utilization
  • Avoid idle time by immediately running experiments one after the other, and shutting down cloud machines cleanly when jobs complete.
  • MissingLink can also help you manage large numbers of experiments, track and share results, and manage large datasets and sync them easily to training machines.

 

Get a free MissingLink account and see how easy it is!

MissingLink is a deep learning platform that streamlines and automates the entire deep learning lifecycle and lets you concentrate on building the most accurate model. Learn more to see how easy it is.

Learn More About Deep Learning Frameworks