Important announcement: Missinglink has shut down. Click here to learn more.

Deep Learning in Healthcare Cover


Tensorflow Reinforcement Learning: Introduction and Hands-On Tutorial

Reinforcement learning is a computational approach used to understand and automate goal-directed learning and decision-making.

This article explains the fundamentals of reinforcement learning, how to use Tensorflow’s libraries and extensions to create reinforcement learning models and methods, and how to manage your Tensorflow experiments through MissingLink’s deep learning platform.

What is Reinforcement Learning?

Reinforcement learning is a high-level framework used to solve sequential decision-making problems. It learns from direct interaction with its environment, without relying on a predefined labeled dataset. It is goal oriented and learns sequences of actions that will maximize the outcome of the action.


A few fundamental concepts form the basis of reinforcement learning:


  • Agent: An agent performs actions in a given environment. An algorithm is an example of an agent.
  • Environment: The environment uses information about the agent’s current state and action as input, and returns the agent’s reward and its next state as output. An example of an environment is the laws of physics.
  • Action: An action is the set of all possible moves the agent can make. Agents choose from a list of possible actions. In video games, for example, the list might include: running right or left, jumping high or low, crouching or standing still.
  • State: A state is a situation in which the agent finds itself. It can be a specific place and moment, a situation returned by the environment or any future situation.
  • Reward: A reward is a feedback used to effectively evaluate the agent’s action.


This interaction can be seen in the diagram below:


reinforcement learning process


The agent learns through repeated interaction with the environment. To be successful, the agent needs to:

  • Learn the interaction between states, actions, and subsequent rewards.
  • Determine which action will provide the optimal outcome.

Reinforcement Learning Use Cases

Reinforcement learning algorithms can be used to solve problems that arise in business settings where task automation is required:


  • Video games: In video games, the agent’s goal is to maximize the score. Each action throughout the game will affect how the agent behaves in relation to this goal.
  • Delivery drones: Reinforcement learning can be used in drone autopilots, as it provides path tracking and navigation capabilities.
  • Manufacturing robots: Reinforcement learning lets a robot autonomously discover optimal behavior through trial-and-error interactions with its environment. A robot can pick specific objects out of a box with some image annotations and sensor technology.
  • Computational resource optimization: Finding solutions for resource management tasks, such as allocating computers to pending jobs, can be challenging. Reinforcement learning algorithms can be used to learn about vacancies and optimally allocate resources to waiting jobs.
  • Personalized recommendations: It can be challenging to create personalized news or advertisement recommendations, because of unpredictable user preferences. The reinforcement learning approach uses feedback from the user to model a recommendation framework with accurate predictions of future rewards.

Reinforcement Learning in Tensorflow: Libraries and Extensions

TensorFlow provides official libraries to build advanced reinforcement learning models or methods using TensorFlow.


TF-Agents: A Flexible Reinforcement Learning Library for TensorFlow


TF-Agents is a modular, well-tested open-source library for deep reinforcement learning with TensorFlow. In TF-Agents, the core elements of reinforcement learning algorithms are implemented as Agents.


Currently, the following algorithms are available under TF-Agents:



Dopamine: TensorFlow-Based Research Framework


Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. Dopamine provides the following features for reinforcement learning researchers:


  • Flexibility—new researchers can easily try out new ideas and run benchmark experiments.
  • Stability—provides a few implemented and tested algorithms.
  • Reproducibility—Dopamine code has full test coverage. These tests also serve as an additional form of documentation.


TRFL: A Library of Reinforcement Learning Building Blocks


TRFL (pronounced “truffle”) is a collection of key algorithmic components for DeepMind agents such as DQN, DDPG, and IMPALA. The TRFL library includes functions to implement both classical reinforcement learning algorithms as well as more cutting-edge techniques.


TRFL can be installed from pip with the following command: pip install trfl


Install Tensorflow and Tensorflow-probability separately to allow TRFL to work both with TensorFlow GPU  and CPU versions.

TensorFlow Reinforcement Learning Example using TF-Agents

In this reinforcement learning tutorial, we will train the Cartpole environment. This is a game that can be accessed through Open AI, an open source toolkit for developing and comparing reinforcement learning algorithms. Following is a screen capture from the game:

open ai cartpole

Source: Open AI



1. Setup reinforcement learning environments: Define suites for loading environments from sources such as the OpenAI Gym, Atari, DM Control, etc., given a string environment name.

2. Setup reinforcement learning agent: Create standard TF-Agents such as DQN, DDPG, TD3, PPO, and SAC.

actor_net = actor_distribution_network.ActorDistributionNetwork(

3. Define standard reinforcement learning policies

eval_policy = tf_agent.policy
collect_policy = tf_agent.collect_policy

4. Define metrics for evaluation of policies.

def compute_avg_return(environment, policy, num_episodes=10):

  total_return = 0.0
  for _ in range(num_episodes):

    time_step = environment.reset()
    episode_return = 0.0

    while not time_step.is_last():
      action_step = policy.action(time_step)
      time_step = environment.step(action_step.action)
      episode_return += time_step.reward
    total_return += episode_return

  avg_return = total_return / num_episodes
  return avg_return.numpy()[0]


5. Collect data: define a function to collect an episode using the given data collection policy and save the data.

def collect_episode(environment, policy, num_episodes):

  episode_counter = 0

  while episode_counter < num_episodes:
    time_step = environment.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = environment.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)

    # Add trajectory to the replay buffer

    if traj.is_boundary():
      episode_counter += 1

6. Train the agent

tf_agent.train = common.function(tf_agent.train)


avg_return = compute_avg_return(eval_env, tf_agent.policy, num_eval_episodes)
returns = [avg_return]

for _ in range(num_iterations):

      train_env, tf_agent.collect_policy, collect_episodes_per_iteration)

  experience = replay_buffer.gather_all()
  train_loss = tf_agent.train(experience)

  step = tf_agent.train_step_counter.numpy()

  if step % log_interval == 0:
    print('step = {0}: loss = {1}'.format(step, train_loss.loss))

  if step % eval_interval == 0:
    avg_return = compute_avg_return(eval_env, tf_agent.policy, num_eval_episodes)
    print('step = {0}: Average Return = {1}'.format(step, avg_return))

7. Visualize the performance of the agent

steps = range(0, num_iterations + 1, eval_interval)
plt.plot(steps, returns)
plt.ylabel('Average Return')


The source code of the tutorial

Reinforcement Learning in the Real World

In this article, we explained the basics of Reinforcement Learning and presented a tutorial on how to train the Cartpole environment using TF-Agents.


Building a successful reinforcement learning model requires large scale experimentation and trial and error. It may be challenging to manage multiple experiments simultaneously, especially across a team. You’ll find it difficult to record the results of experiments, compare current and past results, and share your results with your team.


MissingLink provides a platform that can easily manage deep learning and machine learning experiments. With MissingLink you can schedule, automate, and record your experiments. The platform allows you to track all your experiments, code, machines and results on one pane of glass.


tensorflow reinforcement learning with missinglink

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.