Deep Learning Frameworks

Does TensorFlow Support OpenCL?

OpenCL is an open standard that is designed to utilize the computing power provided by GPUs for general computing applications.


While both AMD and NVIDIA are major vendors of GPUs, NVIDIA is currently the most common GPU vendor for deep learning and cloud computing. NVIDIA’s CUDA toolkit works with all major deep learning frameworks including TensorFlow and has a large community support.


TensorFlow has limited support for OpenCL and AMD GPUs. You can build Tensorflow with SYCL (single source OpenCL) support, but performance might not be as good as with NVIDIA GPUs.


In this article, you will learn:

  • Picking a GPU for Deep Learning: CUDA vs. OpenCL
  • CUDA vs. OpenCL for Deep Learning
  • Does TensorFlow Support OpenCL?
  • Set Up and Run the TensorFlow OpenCL using SYCL

Picking a GPU for Deep Learning: CUDA vs. OpenCL

What Is CUDA?

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. The NVIDIA CUDA Toolkit provides a development environment for creating high-performance GPU-accelerated applications. GPU-accelerated CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.


What is OpenCL?

OpenCL, or Open Computing Language, is a framework designed for building applications that you can run across diverse computer systems. It is an open standard for developing cross-platform, parallel programming applications and has a number of open-source implementations.


OpenCL is designed for developers. Developers can use OpenCL to create applications that can be run on any device, regardless of manufacturer, processor specifications, graphics unit, or other hardware components.


A developer can, for example, build an application on their Windows PC and the application will run equally well on an Android phone, Mac OS X computer, or any other parallel processing device. Provided, of course, that all of these devices support OpenCL and that the appropriate compiler and runtime library has been implemented.


CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty.


CUDA vs. OpenCL for Deep Learning

An Nvidia GPU is the hardware that enables parallel computations, while CUDA is a software layer that provides an API for developers. The CUDA toolkit works with all major DL frameworks  such as Tensoflow, Pytorch, Caffe, and CNTK. If you use NVIDIA GPUs, you will find support is widely available. If you program CUDA yourself, you will have access to support and advice if things go wrong. You will also find that most deep learning libraries have the best support for NVIDIA GPUs.


OpenCL runs on AMD GPUs and provides partial support for TensorFlow and PyTorch. If you want to develop new networks some details might be missing, which could prevent you from implementing the features you need.


The Tensor Cores are optimized processors provided in NVIDIA’s new Volta architecture. Tensor Cores provide superior compute performance for neural network architecture, and convolutional networks, however, their compute performance is not so high when it comes to word-level recurrent networks.

Does TensorFlow Support OpenCL?

To get OpenCL support to TensorFlow, you will have to set up an OpenCL version of TensorFlow using ComputeCpp. Codeplay has begun the process of adding OpenCL support to TensorFlow that can be achieved using SYCL. TensorFlow is built on top of the Eigen C++ library for linear algebra. Because Eigen uses C++ extensively, Codeplay has used SYCL (which enables Eigen-style C++ metaprogramming) to offload parts of Eigen to OpenCL devices.


Some of the GPU acceleration of TensorFlow could use OpenCL C libraries directly, such as for the BLAS components, or convolutions. SYCL is being used for the C++ tensor operations only which enables complex programmability of those tensor operations.

Quick Tutorial #1: Set Up and Run the TensorFlow OpenCL using SYCL

This tutorial will explain how to set up your machine to run the OpenCL version of TensorFlow using ComputeCpp, a SYCL implementation. The guide is based on the code from Codeplay.


1. Install AMDGPU open source unified graphics driver for Linux


wget --referer
tar xf amdgpu-pro-17.50-511655.tar.xz
./amdgpu-pro-17.50-511655/amdgpu-pro-install --opencl=legacy --headless


2. Install the Intel NEO OpenCL GPU driver


sudo dpkg -i intel-opencl_18.38.11535_amd64.deb


3. Verify OpenCL installation


sudo apt-get update
sudo apt-get install clinfo


The output should list at least one platform and one device. The “Extensions” field of the device properties should include cl_khr_spir and/or cl_khr_il_program.


4. Build TensorFlow with SYCL Install dependency packages


sudo apt-get update
sudo apt-get install git cmake gcc build-essential libpython-all-dev opencl-headers openjdk-8-jdk python python-dev python-pip zlib1g-dev
pip install --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6


5. Installation

  • Register for an account on Codeplay’s developer website
  • Download the following version: Ubuntu 16.04 > 64bit > computecpp-ce-1.1.1-ubuntu.16.04-64bit.tar.gz


tar -xf ComputeCpp-CE-1.1.1-Ubuntu.16.04-64bit.tar.gz
sudo mv ComputeCpp-CE-1.1.1-Ubuntu-16.04-x86_64 /usr/local/computecpp
export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
export LD_LIBRARY_PATH+=:/usr/local/computecpp/lib


6. Install Bazel


sudo apt install -y bazel_0.16.0-linux-x86_64.deb
bazel version


7. Build TensorFlow


git clone
cd tensorflow


8. Bundle and install the wheel


bazel-bin/tensorflow/tools/pip_package/build_pip_package <path/to/output/folder>
pip install --user <path/to/output/folder>/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl


9. Run a TensorFlow Benchmark


To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet


git clone
cd benchmarks
git checkout f5d85aef2851881001130b28385795bc4c59fa38
python scripts/tf_cnn_benchmarks/ --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC


Improving GPU Efficiency for Deep Learning with MissingLink

When working with OpenCL GPUs, you need to make sure you manage experiments effectively to avoid wasting expensive GPU resources. TensorFlow does not provide the ability to schedule experiments, TensorFlow makes it difficult to track experiment results on multiple machines and share experiment results across the team.



running experiment across multiple machines

Resource management—Schedule multiple experiments and run automatically to utilize GPU machines to the max.

tracking experiments

Experiment management—Record and remember what happened with every TensorFlow GPU experiment.

manage training datasets

Data management—Understand which dataset was used on multi-GPU experiments. Easily debug models.

Start using Missinlink’s platform to manage your deep learning experiments and create a more efficient multi-GPU cluster setup with little to no idle time.  Learn more to see how easy it is.

Learn More About Deep Learning Frameworks