PyTorch ResNet: Building, Training and Scaling Residual Networks on PyTorch
ResNet was the state of the art in computer vision in 2015 and is still hugely popular. It can train hundreds or thousands of layers without a “vanishing gradient”. PyTorch lets you easily build ResNet models; it provides several pre-trained ResNet architectures and lets you build your own ResNet architectures.
In this article you will learn:
- What is a ResNet neural network
- Why it’s difficult to run ResNet yourself with limited resources
- Using the built-in PyTorch implementations of ResNet
- Building a custom ResNet architecture yourself in PyTorch
- A simple transfer learning example with ResNet-50
What is a ResNet Neural Network
Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture which can support hundreds or more convolutional layers. ResNet can add many layers with strong performance, while previous architectures had a drop off in the effectiveness with each additional layer.
ResNet proposed a solution to the “vanishing gradient” problem. Neural networks train via backpropagation, which relies on gradient descent to find the optimal weights that minimize the loss function. When more layers are added, repeated multiplication of their derivatives eventually makes the gradient infinitesimally small, meaning additional layers won’t improve the performance or can even reduce it.
ResNet solves this using “identity shortcut connections” – layers that initially don’t do anything. In the training process, these identical layers are skipped, reusing the activation functions from the previous layers. This reduces the network into only a few layers, which speeds learning. When the network trains again, the identical layers expand and help the network explore more of the feature space.
Image source: ResearchGate
ResNet was the first network demonstrated to add hundreds or thousands of layers while outperforming shallower networks. Although since its introduction in 2015, newer architectures have been invented which beat ResNet’s performance, it is still a very popular choice for computer vision tasks.
A primary strength of ResNet is its ability to generalize well to different datasets and problems.
ResNet Variations You Can Use on PyTorch
PyTorch supports one ResNet variation, which you can use instead of traditional ResNet, which is DenseNet. DenseNet uses shortcut connections to connect all layers directly with each other. The input of each layer is the feature maps of all earlier layer. Feature maps are joined using depth-concatenation. Concatenating feature maps can preserve them all and increase the variance of the outputs, encouraging feature reuse.
Why it’s Difficult to Run ResNet Yourself and How MissingLink Can Help
ResNet can have between dozens and thousands of layers and can take a long time to train – up to weeks in some extreme cases. To efficiently run ResNet you will need to distribute it across GPUs, and probably scale out to multiple machines.
However, scaling deep learning model across multiple machines is difficult:
- On-premises, you need to buy machines, set them up for deep learning, and manually run experiments on them. Scaling will be naturally limited.
- In the cloud, spinning up machines is easy, but the cost of GPU machines is high. You’ll to create machine images with deep learning tools, and you’ll still need to manually run experiments on each machine, carefully watching resource utilization.
MissingLink solves all that. It’s a deep learning platform that lets you scale out ResNet and other computer vision models automatically across numerous machines.
MissingLink lets you set up experiments as jobs, define a group of on-premise or cloud machines to run on, and your queue of jobs will run automatically on the machines. It’s that easy to scale up your ResNet model across as many machines as you need.
Get a free MissingLink account and see how easy it is!
Options for Running ResNet on PyTorch
Built-In PyTorch ResNet Implementation: torchvision.models
PyTorch provides torchvision.models, which include multiple deep learning models, pre-trained on the ImageNet dataset and ready to use.
Pre-training lets you leverage transfer learning – once the model has learned many objects, features, and textures on the huge ImageNet dataset, you can apply this learning to your own images and recognition problems.
torchvision.models include the following ResNet implementations: ResNet-18, 34, 50, 101 and 152 (the numbers indicate the numbers of layers in the model), and Densenet-121, 161, 169, and 201.
The following classes allow you to access ResNet models in PyTorch:
torchvision.models.resnet18(pretrained=False, **kwargs) torchvision.models.resnet34(pretrained=False, **kwargs) torchvision.models.resnet50(pretrained=False, **kwargs) torchvision.models.resnet101(pretrained=False, **kwargs) torchvision.models.resnet152(pretrained=False, **kwargs) torchvision.models.densenet121(pretrained=False, **kwargs) torchvision.models.densenet161(pretrained=False, **kwargs) torchvision.models.densenet169(pretrained=False, **kwargs) torchvision.models.densenet201(pretrained=False, **kwargs)
pretrained parameter specifies whether the model weights should be randomly initialized (false) or pre-trained on ImageNet (true). You can supply your own Python keyword arguments.
Coding a ResNet Architecture Yourself Using PyTorch
If you want to create a different ResNet architecture than the ones built into PyTorch, you can create your own custom implementation of ResNet. Liu Kuang created an extensive code example that shows how to implement the building blocks of ResNet in PyTorch.
For example, here is how to create the basic identity block:
class BasicBlock(nn.Module): expansion = 1 def __init__(self, in_planes, planes, stride=1): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(planes) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(planes) self.shortcut = nn.Sequential() if stride != 1 or in_planes != self.expansion*planes: self.shortcut = nn.Sequential( nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(self.expansion*planes)
And here is how the entire ResNet network is structured using this and other building blocks:
class ResNet(nn.Module): def __init__(self, block, num_blocks, num_classes=10): super(ResNet, self).__init__() self.in_planes = 64 self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(64) self.layer1 = self._make_layer(block, 64, num_blocks, stride=1) self.layer2 = self._make_layer(block, 128, num_blocks, stride=2) self.layer3 = self._make_layer(block, 256, num_blocks, stride=2) self.layer4 = self._make_layer(block, 512, num_blocks, stride=2) self.linear = nn.Linear(512*block.expansion, num_classes) def _make_layer(self, block, planes, num_blocks, stride): strides = [stride] + *(num_blocks-1) layers =  for stride in strides: layers.append(block(self.in_planes, planes, stride)) self.in_planes = planes * block.expansion return nn.Sequential(*layers)
See the complete code example.
Using ResNet50 with Transfer Learning
A simple way to perform transfer learning with PyTorch’s pre-trained ResNets is to switch the last layer of the network with one that suits your requirements. Here is how to do this, with code examples by Prakash Jain.
The process is to freeze the ResNet layer you don’t want to train, and pass the remaining parameters to your custom optimizer.
Loading the model:
Change the first layer:
num_ftrs = model_conv.fc.in_features model_conv.fc = nn.Linear(num_ftrs, n_class)
model_conv object has child containers, each with its own children which represent the layers. Here is how to freeze the last layer for ResNet50:
for name, child in model_conv.named_children(): for name2, params in child.named_parameters(): print(name, name2)
This displays a long list of parameters, including these:
conv1 weight bn1 weight bn1 bias fc weight fc bias
Here is how to freeze one or more layers. For example, to freeze the first 7 layers:
ct = 0 for name, child in model_conv.named_children(): ct += 1 if ct < 7: for name2, params in child.named_parameters(): params.requires_grad = False
To adapt the layer to fit your data, consider how the underlying layers are represented.
Scaling ResNet on PyTorch
In this article, we saw the basics of ResNet and two ways to run ResNet on PhTorch: pre-trained models in the pytorch.vision modules, or by coding ResNet components yourself directly in PyTorch. We also saw a simple application of transfer learning with ResNet-50.
Training ResNet is extremely computationally intensive and becomes more difficult the more layers you add. Instead of waiting for hours or days for training to complete, use the MissingLink deep learning framework to automate the process:
- Define a group of machines either on-premises or in the cloud, and automatically run deep learning jobs on them
- Scale out ResNet automatically across numerous machines
- Avoid idle time by scheduling a queue of experiments and utilizing machines to the max