Important announcement: Missinglink has shut down. Click here to learn more.

Deep Learning Frameworks Cover


Does TensorFlow Support OpenCL?

OpenCL is an open standard that is designed to utilize the computing power provided by GPUs for general computing applications.


While both AMD and NVIDIA are major vendors of GPUs, NVIDIA is currently the most common GPU vendor for deep learning and cloud computing. NVIDIA’s CUDA toolkit works with all major deep learning frameworks, including TensorFlow, and has a large community support.


TensorFlow has limited support for OpenCL and AMD GPUs. You can build Tensorflow with SYCL (single source OpenCL) support. We’ll show you how to set this up, but performance might not be as good as with NVIDIA GPUs. We’ll also show you how to utilize GPUs to the max by scheduling and running experiments automatically, using the MissingLink deep learning platform.

Picking a GPU for Deep Learning: CUDA vs. OpenCL

What Is CUDA?

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce. The NVIDIA CUDA Toolkit provides a development environment for creating high-performance GPU-accelerated applications. GPU-accelerated CUDA libraries enable acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics.


What is OpenCL?

OpenCL, or Open Computing Language, is a framework designed for building applications that you can run across diverse computer systems. It is an open standard for developing cross-platform, parallel programming applications and has a number of open-source implementations.


OpenCL is designed for developers. Developers can use OpenCL to create applications that can be run on any device, regardless of manufacturer, processor specifications, graphics unit, or other hardware components.


A developer can, for example, build an application on their Windows PC and the application will run equally well on an Android phone, Mac OS X computer, or any other parallel processing device. Provided, of course, that all of these devices support OpenCL and that the appropriate compiler and runtime library has been implemented.


CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty.

CUDA vs. OpenCL for Deep Learning

An Nvidia GPU is the hardware that enables parallel computations, while CUDA is a software layer that provides an API for developers. The CUDA toolkit works with all major DL frameworks  such as TensorFlow, Pytorch, Caffe, and CNTK. If you use NVIDIA GPUs, you will find support is widely available. If you program CUDA yourself, you will have access to support and advice if things go wrong. You will also find that most deep learning libraries have the best support for NVIDIA GPUs.


OpenCL runs on AMD GPUs and provides partial support for TensorFlow and PyTorch. If you want to develop new networks some details might be missing, which could prevent you from implementing the features you need.


The Tensor Cores are optimized processors provided in NVIDIA’s new Volta architecture. Tensor Cores provide superior compute performance for neural network architecture, and convolutional networks, however, their compute performance is not so high when it comes to word-level recurrent networks.

Does TensorFlow Support OpenCL?

To get OpenCL support to TensorFlow, you will have to set up an OpenCL version of TensorFlow using ComputeCpp. Codeplay has begun the process of adding OpenCL support to TensorFlow that can be achieved using SYCL. TensorFlow is built on top of the Eigen C++ library for linear algebra. Because Eigen uses C++ extensively, Codeplay has used SYCL (which enables Eigen-style C++ metaprogramming) to offload parts of Eigen to OpenCL devices.


Some of the GPU acceleration of TensorFlow could use OpenCL C libraries directly, such as for the BLAS components, or convolutions. SYCL is being used for the C++ tensor operations only which enables complex programmability of those tensor operations.

Quick Tutorial #1: Set Up and Run the TensorFlow OpenCL using SYCL

This tutorial will explain how to set up your machine to run the OpenCL version of TensorFlow using ComputeCpp, a SYCL implementation. The guide is based on the code from Codeplay.


1. Install AMDGPU open source unified graphics driver for Linux

wget --referer
tar xf amdgpu-pro-17.50-511655.tar.xz
./amdgpu-pro-17.50-511655/amdgpu-pro-install --opencl=legacy --headless


2. Install the Intel NEO OpenCL GPU driver

sudo dpkg -i intel-opencl_18.38.11535_amd64.deb


3. Verify OpenCL installation 

sudo apt-get update
sudo apt-get install clinfo

The output should list at least one platform and one device. The “Extensions” field of the device properties should include cl_khr_spir and/or cl_khr_il_program.


4. Build TensorFlow with SYCL Install dependency packages

sudo apt-get update
sudo apt-get install git cmake gcc build-essential libpython-all-dev opencl-headers openjdk-8-jdk python python-dev python-pip zlib1g-dev
pip install --user numpy==1.14.5 wheel==0.31.1 six==1.11.0 mock==2.0.0 enum34==1.1.6


5. Installation

  • Register for an account on Codeplay’s developer website
  • Download the following version: Ubuntu 16.04 > 64bit > computecpp-ce-1.1.1-ubuntu.16.04-64bit.tar.gz
tar -xf ComputeCpp-CE-1.1.1-Ubuntu.16.04-64bit.tar.gz
sudo mv ComputeCpp-CE-1.1.1-Ubuntu-16.04-x86_64 /usr/local/computecpp
export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
export LD_LIBRARY_PATH+=:/usr/local/computecpp/lib


6. Install Bazel

sudo apt install -y bazel_0.16.0-linux-x86_64.deb
bazel version


7. Build TensorFlow

git clone
cd tensorflow


8. Bundle and install the wheel

bazel-bin/tensorflow/tools/pip_package/build_pip_package <path/to/output/folder>
pip install --user <path/to/output/folder>/tensorflow-1.9.0-cp27-cp27mu-linux_x86_64.whl


9. Run a TensorFlow Benchmark

 To verify the installation, you can execute some of the standard TensorFlow benchmarks. The example below shows how to run AlexNet

git clone
cd benchmarks
git checkout f5d85aef2851881001130b28385795bc4c59fa38
python scripts/tf_cnn_benchmarks/ --num_batches=10 --local_parameter_device=sycl --device=sycl --batch_size=1 --forward_only=true --model=alexnet --data_format=NHWC


Improving GPU Efficiency for Deep Learning with MissingLink

  • running experiment across multiple machines

    Resource management

    Schedule multiple experiments and run automatically to utilize GPU machines to the max.

  • tracking experiments

    Experiment management

    Record and remember what happened with every TensorFlow GPU experiment.

  • manage training datasets

    Data management

    Understand which dataset was used on multi-GPU experiments. Easily debug models.


MissingLink is a deep learning platform that does all of this for you, and lets you concentrate on building the most accurate model. Learn more to see how easy it is.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.