AWS Deep Learning Solutions
AWS provides several tools that you can use for Deep Learning (DL), including AMIs and containers. We review these tools and help you make an informed decision about how to build your deep learning infrastructure on the cloud.
In this article:
- Cloud machine learning
- Deep learning on AWS
- AWS Deep Learning AMIs
- AWS containers for deep learning
- Scaling up AWS deep learning with MissingLink
Public cloud providers offer resources to help you manage Machine Learning (ML) projects. A common solution is Machine Learning as a Service (MLaaS), including automated and semi-automated cloud platforms that provide infrastructure for tasks like data pre-processing, model evaluation, and model training. MLaaS providers typically offer additional tools and services, such as data visualization, face recognition, predictive analytics, and deep learning. You can use ML for predictive marketing, advanced analytics for customer data, fraud detection, and back-end security threat detection.
AWS offers several Graphics Processing Unit (GPU) instance types with memory capacity between 8-256GB, priced at an hourly rate. GPUs are specialized processors designed for complex image processing, but they are also commonly used to accelerate deep learning computations.
There are significant benefits to deep learning in the cloud, including cost, speed, scalability, and flexibility. You can scale your deep learning model more efficiently on the cloud by dynamically spinning up machine instances and using powerful GPUs. You can design, develop, and train your deep learning applications faster using distributed networks.
What is an AMI?
An Amazon Machine Image (AMI) is a read-only filesystem image used for creating a virtual machine in the Amazon Elastic Compute Cloud (EC2). The AMI provides the information required to launch EC2 instances, which you can use to train complex custom models or to experiment with new algorithms. A block device mapping specifies which volumes to attach to an instance, based on the root volume template.
Deep learning AMIs provide work with Linux, and provide some of the latest GPU-accelerated libraries like CUDA and CuDNN.
A major benefit of AWS Deep Learning AMIs is their support for deep learning frameworks. AMIs are pre-installed with Apache MXNet and Gluon, Caffe, Caffe2, Keras, Microsoft Cognitive Toolkit, Pytorch, TensorFlow, Theano, and Torch, so you can launch them quickly and train them at scale.
You can choose from several AWS deep learning AMIs, or you can create your own AMI and share it. Some of the popular AMI types include:
- Conda AMI—configured to switch easily between deep learning environments, Conda offers pre-installed pip packages of deep learning frameworks. It can quickly install, run, and update these packages and their dependencies.
- Base AMI—provides a clean slate for setting up a private deep learning engine repository or a custom build of a deep learning engine. It provides a foundational platform of GPU drivers and acceleration libraries for deploying customized deep learning environments. The default AMI configuration is with NVIDIA CUDA 9, but you can switch to CUDA 10 or CUDA 8 environment.
- AMI with source code—provides pre-installed deep learning frameworks along with their source code. It runs in a shared Python environment and is available for P3 instances in CUDA 9 or P2 instances in CUDA 8.
Setting up deep learning projects in the cloud can be time-consuming, while testing container images requires special expertise. Further complicating the build is the need to handle various software dependencies and version compatibility issues.
AWS launched a library of preconfigured Docker images called AWS deep learning containers, which help increase performance while reducing training time. The deep learning frameworks preinstalled in the containers allow you to deploy your ML environment quickly without having to build and optimize it from scratch.
AWS containers help you set up and customize machine learning environments on the cloud, and you can use them to train on single nodes or multi-node clusters and access graphics processing units (GPUs). You can use AWS deep learning containers for training or inferencing, which involves applying a learned capability to new data.
AWS deep learning containers support machine learning frameworks like Apache MXNet and Google’s TensorFlow, and you can also use them for the Horovod distributed training framework. The containers work on all AWS services and offer the flexibility to build custom workflows to train, validate, and deploy machine learning projects.
MissingLink can help you manage your deep learning project on AWS. With MissingLink, you can easily set up or scale up your project, load datasets to the cloud, and run experiments on a cluster of AWS machines. The MissingLink dashboard allows you to monitor running experiments and analyze their metrics. Learn more.
Schedule and automate experiments and utilize multi-GPU machines. You only need to set up the environment for running experiments at scale once.
Keep track of large-scale experiments, share results across your team, and adjust experiment usage according to your needs to avoid unnecessary expenses.
Run multiple experiments simultaneously to increase productivity and avoid repetitive, time-consuming work data on multi-GPUs.