Important announcement: Missinglink has shut down. Click here to learn more.
Building, Training and Scaling Residual Networks on TensorFlow
Two Quick Tutorials
Introduction and Hands-On Tutorial
Working with CNN Max Pooling Layers in TensorFlow
Two Quick Tutorials
Tutorials
Three Quick Tutorials
Three Quick Tutorials
Introduction and Tutorials
A Practical Guide
Key Approaches and Tutorials
Introduction and Examples
Three Examples
ResNet won first place in the Large Scale Visual Recognition Challenge (ILSVRC) in 2015. It was the first neural network not affected by the “vanishing gradient” problem. TensorFlow makes it easy to build ResNet models: you can run pre-trained ResNet-50 models, or build your own custom ResNet implementation. We’ll show you how to do this with ImageNet or CIFAR-10 datasets, and how to automate the process through MissingLink’s deep learning platform.
Residual Network (ResNet) is a Convolutional Neural Network (CNN) architecture, designed to train very deep neural networks. In theory, a deeper neural network should perform better on the training set because of additional layers processing smaller and smaller features. In reality, the frequency of training errors increases the network is too deep. With ResNet, the training error actually decreases as the network gets deeper.
ResNet provides a breakthrough solution to the “vanishing gradient” problem. Vanishing gradient is a difficulty encountered when you train artificial neural networks with gradient-based methods like backpropagation. With these methods, the gradients of the loss function approach zero as you add more layers to the network. This makes it hard to learn and tune the parameters of the earlier layers in the network.
ResNet is based on “shortcut connections”. This is a way to skip the training of one or more layers — creating a residual block.
Residual blocks allow you to train much deeper neural networks. ResNet structured by taking many of these blocks and stacking them together to form a deep network.
ResNet can have between dozens and thousands of convolutional layers and can take a long time to train and execute —from hours to several weeks in extreme cases. You will need to distribute a ResNet model across multiple GPUs, and if performance is insufficient, scale out to multiple machines.
However, you’ll find that running a deep learning model on multiple machines is difficult:
MissingLink solves all that. It’s a deep learning platform that lets you scale out ResNet and other computer vision models automatically across numerous machines.
Just set up jobs in the MissingLink dashboard, define your cluster of on-premise or cloud machines, and the jobs will automatically run on your cluster of machines. You can train a ResNet model in minutes – not hours or days.
To avoid idle time, MissingLink immediately runs another experiment when the previous one ends, and cleanly shuts down cloud machines when all jobs complete.
Learn more about the MissingLink platform.
The TensorFlow official models are a collection of example models that use TensorFlow’s high-level APIs. The official TensorFlow Resnet model contains an implementation of ResNet for the ImageNet and the CIFAR-10 datasets written in TensorFlow.
You can download pre-trained versions of ResNet-50. There are four versions of ResNet-50 available with different precision accuracies. You can use transfer learning to speed up the process of training the pre-trained model. In transfer learning, a pre-trained model is used as a starting point of the next layer. In addition, you can freeze all of the layers, except the final fully connected layers, when fine-tuning your model.
While transfer learning is a powerful technique, you’ll find it valuable to learn how to train ResNet from scratch. Become familiar with the full training process, from launching TensorFlow, downloading and preparing ImageNet, to documenting and reporting training results.
To illustrate the process, here is Exxact’s TensorFlow code on how to train a ResNet model from scratch in TensorFlow:
1. Launch TensorFlow environment with Docker
nvidia-docker run -it -v /data:/datasets tensorflow/tensorflow:nightly-gpu bash
2. Download ImageNet
2.1 Clone the TPU repository
git clone https://github.com/tensorflow/tpu.git
2.2 install the GCS dependencies
pip install gcloud google-cloud-storage
2.3 Downloads the files from Image-Net.org
python imagenet_to_gcs.py --local_scratch_dir=/data/imagenet --nogcs_upload
3. Download Official TensorFlow models
git clone https://github.com/tensorflow/models.git
4. Export PYTHONPATH
Export PYTONPATH to the models folder on your machine. Be sure to replace /datasets/models with your folder path.
export PYTHONPATH="$PYTHONPATH:/datasets/models"
5. Install Dependencies
pip install --user -r official/requirements.txt
6. Run the training script imagenet_main.py
python imagenet_main.py --data_dir=/data/imagenet/train --num_gpus= 2 --batch_size=64 --resnet_size= 50 --model_dir=/data/imagenet/trained_model/Resnet50_bs64 --train_epochs=120
In this article, we learned the basics of ResNet and saw two ways to run TensorFlow ResNet:
Training ResNet is extremely computationally intensive, especially when working with a large number of layers. Don’t wait for hours or days for ResNet to train. Use the MissingLink deep learning framework to:
MissingLink can also help you manage large numbers of experiments, track and share results, and manage large datasets and sync them easily to training machines.
Learn more about the MissingLink deep learning platform.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
The most comprehensive platform to manage experiments, data and resources more frequently, at scale and with greater confidence.
MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.
Request your personal demo to start training models faster