Running a Python script for a few hours just to find out its results are useless – feels bad. Many software domains have long running processes that need to be monitored, but machine learning’s requirements are a bit different. Machine learning requires monitoring metrics live, but also comparing metrics after the process died. Alongside that, users need to see the metrics visually in charts to identify trends. This post will expand on these differences and explore the most popular strategies for monitoring a model training session using Python. Follow along with the code from this post on github.
(title image by www_slon_pics from Pixabay)
Machine Learning Monitoring Needs
Machine learning’s goal is to build a model that can do something useful and the obstacles to getting there are many. From needing to work with expensive hardware like GPUs, to massive amounts of data that takes a long time to label, validate and truck around. To attain an optimal model, you have to understand the trends of the metrics during training and intervene before overfitting sets in. The best way to identify these trends is visually, see the following chart.
You have to be able to visualize your metrics to identify bugs like spikes and dips, and stopping the experiment early when it’s flat-lining, and seeing early results. The workflow is filled with other such corners a model developer needs to work around such as:
- Short runs to test ideas
- Long runs to fully train a model
- Running the script locally
- Running the script remotely (to use expensive hardware elsewhere)
- The training code is already complex enough that you don’t want to add yet another layer like managing a server just for managing logs
With these challenges in mind, let’s inspect a few strategies to monitor machine learning training scripts.
Strategy #1 – Command Line
The “hello world” of machine learning monitoring is to just print out results to std-out. This works great except for the cases where your terminal scrolls out of control and you can’t see anything. One level above that is printing out a status on the same line starting at
\r. Like so:
for i in range(10):
Printing only one line at a time after a
\r solves the scrolling out of control problem. A popular package to show a progress bar while executing Python code is tqdm, and it utilizes the
But that’s just tracking progress. Machine learning needs richer information like accuracy, loss and other domain specific metrics. Using just one line for output constrains the space so much that there’s barely any room to show vital information. A solution for showing multi-line output that doesn’t scroll like crazy is:
os.system('cls' if os.name == 'nt' else 'clear')
You can see in the above gif the output shows multiple parameters at once: progress, loss and accuracy. This is pretty good except that clearing the screen hides the history. To see trends – you need a graph – that’s just how our brains work. While terminals aren’t very good for visuals, some people attempt drawing ASCII charts. From my tests, the existing terminal charting solutions are all poor, they’re low resolution to the point of unreadable and often just get completely scrambled when the window is resized or the output gets too wide.
Strategy #2 – Matplotlib
The most popular python visualization library is matplotlib. Usually showing a matplotlib chart blocks execution which would pause training until the window was closed. Luckily, there’s a trick to make matplotlib more dynamic, it’s called interactive mode and we turn it on by calling
plt.ion(). The simplest example would be:
import matplotlib.pyplot as plt
But that would just let you keep running Python while the window is open. To handle clearing the figure, and dynamically adding data points like the following gif example, check out the snippet that follows. Notice the parameters in the terminal, and the chart visualized in a window next to it.
# Required for jupyter: %matplotlib notebook
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
# `start_event_loop` is required for console, not jupyter notebooks.
# Don't use `plt.pause` because it steals focus and makes it hard
# to stop the app.
if __name__ == "__main__":
losses = 
for i in range(0, 1000):
losses.append(i ** 2 / (i + 1) ** 2.5)
Showing a chart alongside the terminal is pretty cool and useful. Though it’s not all roses. The downsides you can expect are:
- When execution completes – your chart is gone and you can never see it again. Though you could manually save a picture of it, or save the data points. This issue is exacerbated in long running scripts because you want the process to die to free up resources, but you might lose the chart and data when it ends
- Collaborating on these charts is impossible unless you build another layer around it.
- If you want to reslice the data, or focus on a different metric – you may be out of luck. Not to mention comparing different experiment runs is going to be an engineering project.
- Matplotlib on remote machines isn’t easy to set up.
Tactics – Jupyter Notebooks
There are a few minor adaptations for using the above approaches in notebooks. The same constraints from the terminal apply but they are at the cell level. For example, the matplotlib visualization requires
%matplotlib notebook, and you can easily flood the print-outs of a cell’s output. Using live-loss-plot isn’t too bad. A cool added benefit of a Jupyter Notebooks is that you can more easily store and share a visual state of the notebook.
Strategy #3 – Metric Monitoring Tools and Services
Tensorboard. It’s so good you practically have to use tensorboard if you’re using Tensorflow. Sadly, if you’re using anything else like PyTorch, xgboost, or sklearn – you’re out of luck. Even with TensorBoard it might get a bit more complicated when you’re monitoring multiple experiments at the same time and sharing tensorboard files. A better metrics solution would integrate easily with the framework du jour.
There is obviously a need for this kind of service and while many companies are trying to build tools around this problem, MissingLink has created an SDK that helps you monitor live machine learning experiments. MissingLink makes it easy to directly monitor an experiment, compare multiple live experiments at the same time, identify and stop the ones that aren’t performing well, and analyze the finale results of completed experiments.
MissingLink supports many popular Python frameworks. For example here are a few lines of code needed to monitor a Keras experiment in MissingLink’s Dashboard. Check out the complete tutorial on github.
missinglink_callback = missinglink.KerasCallback()
Now when you run a Python script, another row will show up in the experiments dashboard like so:
From the table you can tell who ran what experiment, when it started, how long its been running for, the hyperparameters, and additional metrics. You can sort and sift through this table to find the interesting runs. But this is just a high level view of the current activity of your team’s work. You can learn more about any experiment by clicking on it and diving into the experiment dashboard. This view allows you to see the charts and a more spaced out view of the hyperparameters and metrics.
There’s also a confusion matrix, a generic charts section, and a section for all associated artifacts such as the weights, tensorboard file and outlier date. If you’re using MissingLink’s data management system – you’ll find a link to the exact data slice that was in use.
For me the most critical piece is the link to the code that ran in that experiment. When configured, every run’s exact code is sent to my private tracking repository. This prevents changes from going unnoticed and allows me to launch experiments at full speed without having to keep notes on which is which. Code tracking is huge when trying to figure out how a specific model was so much better or worse than another, it lets you experiment wild without losing your mind.
After we’ve explained how hard monitoring one experiment is – imagine tackling multiple at the same time. Selecting a few different experiments from the dashboard to compare allows analyzing past experiments, and you can even use it for monitoring live experiments in parallel to decide which ones to abort.
All these results show up on the dashboard for your colleagues and future self to analyze.
Working on machine learning models requires a new set of tools. This is especially true when working on a team, under regulatory requirements for documentation, or if you need to make progress faster. I can run an order of magnitude more experiments when I know they’re automatically tracked and visualized efficiently. Experiment management, is a core focus for MissingLink. We build the AI infrastructure so you don’t have to.
While there is no one size fits all solution for monitoring experiments, we’re happy to chat and better understand your needs and how we can help. Make sure to drop us a note and request a demo.