Important announcement: Missinglink has shut down. Click here to learn more.

Deep Learning Frameworks Cover


PyTorch ResNet: Building, Training and Scaling Residual Networks on PyTorch

ResNet was the state of the art in computer vision in 2015 and is still hugely popular. It can train hundreds or thousands of layers without a “vanishing gradient”. PyTorch lets you easily build ResNet models; it provides several pre-trained ResNet architectures and lets you build your own ResNet architectures. MissingLink’s deep learning platform enables automation capabilities for tracking models, logging data, managing the distribution of resources, and running experiments. You can use the MissingLink platform to scale your ResNet projects.


This article offers a conceptual review of the subject, as well as practical tips. We’ll explain how ResNet works, and provide you with a few PyTorch ResNew examples.

What is a ResNet Neural Network

Residual Network (ResNet) is a Convolutional Neural Network (CNN)  architecture which can support hundreds or more convolutional layers. ResNet can add many layers with strong performance, while previous architectures had a drop off in the effectiveness with each additional layer.

ResNet proposed a solution to the “vanishing gradient” problem. Neural networks train via backpropagation, which relies on gradient descent to find the optimal weights that minimize the loss function. When more layers are added, repeated multiplication of their derivatives eventually makes the gradient infinitesimally small, meaning additional layers won’t improve the performance or can even reduce it.

ResNet solves this using “identity shortcut connections” – layers that initially don’t do anything. In the training process, these identical layers are skipped, reusing the activation functions from the previous layers. This reduces the network into only a few layers, which speeds learning. When the network trains again, the identical layers expand and help the network explore more of the feature space.


resnet identity

Image source: ResearchGate


ResNet was the first network demonstrated to add hundreds or thousands of layers while outperforming shallower networks. Although since its introduction in 2015, newer architectures have been invented which beat ResNet’s performance, it is still a very popular choice for computer vision tasks.

A primary strength of the ResNet architecture is its ability to generalize well to different datasets and problems.

ResNet Variations You Can Use on PyTorch

PyTorch supports one ResNet variation, which you can use instead of the traditional ResNet architecture, which is DenseNet. DenseNet uses shortcut connections to connect all layers directly with each other. The input of each layer is the feature maps of all earlier layer. Feature maps are joined using depth-concatenation. Concatenating feature maps can preserve them all and increase the variance of the outputs, encouraging feature reuse.

resnet variations

Options for Running ResNet on PyTorch

Built-In PyTorch ResNet Implementation: torchvision.models

PyTorch provides torchvision.models, which include multiple deep learning models, pre-trained on the ImageNet dataset and ready to use.

Pre-training lets you leverage transfer learning – once the model has learned many objects, features, and textures on the huge ImageNet dataset, you can apply this learning to your own images and recognition problems.

torchvision.models include the following ResNet implementations: ResNet-18, 34, 50, 101 and 152 (the numbers indicate the numbers of layers in the model), and Densenet-121, 161, 169, and 201.

The following classes allow you to access ResNet models in PyTorch:

torchvision.models.resnet18(pretrained=False, **kwargs)
torchvision.models.resnet34(pretrained=False, **kwargs)
torchvision.models.resnet50(pretrained=False, **kwargs)
torchvision.models.resnet101(pretrained=False, **kwargs)
torchvision.models.resnet152(pretrained=False, **kwargs)
torchvision.models.densenet121(pretrained=False, **kwargs)
torchvision.models.densenet161(pretrained=False, **kwargs)
torchvision.models.densenet169(pretrained=False, **kwargs)
torchvision.models.densenet201(pretrained=False, **kwargs)


The pretrained parameter specifies whether the model weights should be randomly initialized (false) or pre-trained on ImageNet (true). You can supply your own Python keyword arguments.


Coding a ResNet Architecture Yourself Using PyTorch

If you want to create a different ResNet architecture than the ones built into PyTorch, you can create your own custom implementation of ResNet. Liu Kuang created an extensive code example that shows how to implement the building blocks of ResNet in PyTorch.

Here is a PyTorch ResNet example of how to create the basic identity block:

class BasicBlock(nn.Module):
   expansion = 1

   def __init__(self, in_planes, planes, stride=1):
       super(BasicBlock, self).__init__()
       self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
       self.bn1 = nn.BatchNorm2d(planes)
       self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
       self.bn2 = nn.BatchNorm2d(planes)

       self.shortcut = nn.Sequential()
       if stride != 1 or in_planes != self.expansion*planes:
           self.shortcut = nn.Sequential(
               nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),

And here is how the entire ResNet network is structured using this and other building blocks:

class ResNet(nn.Module):
   def __init__(self, block, num_blocks, num_classes=10):
       super(ResNet, self).__init__()
       self.in_planes = 64

       self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
       self.bn1 = nn.BatchNorm2d(64)
       self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
       self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
       self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
       self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
       self.linear = nn.Linear(512*block.expansion, num_classes)

   def _make_layer(self, block, planes, num_blocks, stride):
       strides = [stride] + [1]*(num_blocks-1)
       layers = []
       for stride in strides:
           layers.append(block(self.in_planes, planes, stride))
           self.in_planes = planes * block.expansion
       return nn.Sequential(*layers)

See the complete code example.

Using ResNet50 with Transfer Learning

A simple way to perform transfer learning with PyTorch’s pre-trained ResNets is to switch the last layer of the network with one that suits your requirements. Here is how to do this, with code examples by Prakash Jain.

The process is to freeze the ResNet layer you don’t want to train and pass the remaining parameters to your custom optimizer.

Loading the model:



Change the first layer:

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, n_class)


The model_conv object has child containers, each with its own children which represent the layers. Here is how to freeze the last layer for ResNet50:

for name, child in model_conv.named_children():
    for name2, params in child.named_parameters():
        print(name, name2)


This displays a long list of parameters, including these:

conv1 weight
bn1 weight
bn1 bias
fc weight
fc bias


Here is how to freeze one or more layers. For example, to freeze the first 7 layers:

ct = 0
for name, child in model_conv.named_children():
    ct += 1
    if ct < 7:
        for name2, params in child.named_parameters():
        params.requires_grad = False


To adapt the layer to fit your data, consider how the underlying layers are represented.

Scaling ResNet on PyTorch

In this article, we explained the basics of ResNet and provided two ways to run ResNet on PhTorch: pre-trained models in the modules, or by coding ResNet components yourself directly in PyTorch. We also reviewed a simple application of transfer learning with ResNet-50.

Training ResNet is extremely computationally intensive and becomes more difficult the more layers you add. Instead of waiting for hours or days for training to complete, use the MissingLink deep learning framework to automate the process:

  • Define a group of machines either on-premises or in the cloud, and automatically run deep learning jobs on them
  • Scale out ResNet automatically across numerous machines
  • Avoid idle time by scheduling a queue of experiments and utilizing machines to the max

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.