Does TensorFlow Support OpenCL?
OpenCL is an open standard that is designed to utilize the computing power provided by GPUs for general computing applications.
While both AMD and NVIDIA are major vendors of GPUs, NVIDIA is currently the most common GPU vendor for deep learning and cloud computing. NVIDIA’s CUDA toolkit works with all major deep learning frameworks including TensorFlow and has a large community support.
TensorFlow has limited support for OpenCL and AMD GPUs. You can build Tensorflow with SYCL (single source OpenCL) support, but performance might not be as good as with NVIDIA GPUs.
In this article, you will learn:
- Picking a GPU for Deep Learning: CUDA vs. OpenCL
- CUDA vs. OpenCL for Deep Learning
- Does TensorFlow Support OpenCL?
- Set Up and Run the TensorFlow OpenCL using SYCL
What Is CUDA?
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. The NVIDIA CUDA Toolkit provides a development environment for creating high-performance GPU-accelerated applications. GPU-accelerated CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.
What is OpenCL?
OpenCL, or Open Computing Language, is a framework designed for building applications that you can run across diverse computer systems. It is an open standard for developing cross-platform, parallel programming applications and has a number of open-source implementations.
OpenCL is designed for developers. Developers can use OpenCL to create applications that can be run on any device, regardless of manufacturer, processor specifications, graphics unit, or other hardware components.
A developer can, for example, build an application on their Windows PC and the application will run equally well on an Android phone, Mac OS X computer, or any other parallel processing device. Provided, of course, that all of these devices support OpenCL and that the appropriate compiler and runtime library has been implemented.
CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty.
An Nvidia GPU is the hardware that enables parallel computations, while CUDA is a software layer that provides an API for developers. The CUDA toolkit works with all major DL frameworks such as Tensoflow, Pytorch, Caffe, and CNTK. If you use NVIDIA GPUs, you will find support is widely available. If you program CUDA yourself, you will have access to support and advice if things go wrong. You will also find that most deep learning libraries have the best support for NVIDIA GPUs.
OpenCL runs on AMD GPUs and provides partial support for TensorFlow and PyTorch. If you want to develop new networks some details might be missing, which could prevent you from implementing the features you need.
The Tensor Cores are optimized processors provided in NVIDIA’s new Volta architecture. Tensor Cores provide superior compute performance for neural network architecture, and convolutional networks, however, their compute performance is not so high when it comes to word-level recurrent networks.
To get OpenCL support to TensorFlow, you will have to set up an OpenCL version of TensorFlow using ComputeCpp. Codeplay has begun the process of adding OpenCL support to TensorFlow that can be achieved using SYCL. TensorFlow is built on top of the Eigen C++ library for linear algebra. Because Eigen uses C++ extensively, Codeplay has used SYCL (which enables Eigen-style C++ metaprogramming) to offload parts of Eigen to OpenCL devices.
Some of the GPU acceleration of TensorFlow could use OpenCL C libraries directly, such as for the BLAS components, or convolutions. SYCL is being used for the C++ tensor operations only which enables complex programmability of those tensor operations.
This tutorial will explain how to set up your machine to run the OpenCL version of TensorFlow using ComputeCpp, a SYCL implementation. The guide is based on the code from Codeplay.
1. Install AMDGPU open source unified graphics driver for Linux
wget --referer http://support.amd.com/ https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz tar xf amdgpu-pro-17.50-511655.tar.xz ./amdgpu-pro-17.50-511655/amdgpu-pro-install --opencl=legacy --headless
2. Install the Intel NEO OpenCL GPU driver
Wget https://github.com/intel/compute-runtime/releases/download/18.38.11535/intel-opencl_18.38.11535_amd64.deb sudo dpkg -i intel-opencl_18.38.11535_amd64.deb
3. Verify OpenCL installation
sudo apt-get update sudo apt-get install clinfo Clinfo
The output should list at least one platform and one device. The “Extensions” field of the device properties should include cl_khr_spir and/or cl_khr_il_program.
4. Build TensorFlow with SYCL Install dependency packages
sudo apt-get update sudo apt-get install git cmake gcc build-essential libpython-all-dev opencl-headers openjdk-8-jdk python python-dev python-pip zlib1g-dev pip install --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6
- Register for an account on Codeplay’s developer website
- Download the following version: Ubuntu 16.04 > 64bit > computecpp-ce-1.1.1-ubuntu.16.04-64bit.tar.gz
tar -xf ComputeCpp-CE-1.1.1-Ubuntu.16.04-64bit.tar.gz sudo mv ComputeCpp-CE-1.1.1-Ubuntu-16.04-x86_64 /usr/local/computecpp export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp export LD_LIBRARY_PATH+=:/usr/local/computecpp/lib /usr/local/computecpp/bin/computecpp_info
6. Install Bazel
wget https://github.com/bazelbuild/bazel/releases/download/0.16.0/bazel_0.16.0-linux-x86_64.deb sudo apt install -y bazel_0.16.0-linux-x86_64.deb bazel version
7. Build TensorFlow
git clone http://github.com/codeplaysoftware/tensorflow cd tensorflow
8. Bundle and install the wheel
bazel-bin/tensorflow/tools/pip_package/build_pip_package <path/to/output/folder> pip install --user <path/to/output/folder>/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl
9. Run a TensorFlow Benchmark
To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet
git clone http://github.com/tensorflow/benchmarks cd benchmarks git checkout f5d85aef2851881001130b28385795bc4c59fa38 python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC
Improving GPU Efficiency for Deep Learning with MissingLink
When working with OpenCL GPUs, you need to make sure you manage experiments effectively to avoid wasting expensive GPU resources. TensorFlow does not provide the ability to schedule experiments, TensorFlow makes it difficult to track experiment results on multiple machines and share experiment results across the team.