Training a deep neural network can take minutes, hours, days or even weeks, especially if your dataset comes with latency or preprocessing requirements. Slow training is expensive in compute time, but it’s also wasteful in human brain time. Faster training lets you test out more ideas. With speed in mind – it’s good to be aware of as many tools as possible to accelerate training, especially the easy tools. In this blog post I’ll explain and evaluate Keras workers which are a minor argument change that can take down training time by a factor of 6.
When to use fit_generator(workers=8)
The Keras methods fit_generator, evaluate_generator, and predict_generator have an argument called
workers. By setting workers to
multiprocessing.cpu_count() instead of the default
1, Keras will spawn threads (or processes with the
use_multiprocessing argument) when ingesting data batches. Keras with TensorFlow parallelizes the backwards and forwards passes by default, but data loading does not receive that treatment because Keras can’t assume if it’s safe or not to do so. The performance implications are dramatic if you’re fetching data from a database, a cloud bucket, network share, or have non-negligible preprocessing time. That’s the “tl;dr”, the rest of this post describes how I evaluated this feature.
- MNIST Dataset (surprise!) using this code.
- MacBook Pro (15-inch, 2017), macOS 10.13.6 (17G65)
- 3.1 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3
- Python 3.6.5, Keras 2.2.4, TensorFlow 1.12.0
Measuring training performance from the command line
To test the performance gains, I used the time command (Windows CMD users, sorry, install bash please). To measure any process, add the word
time before it like so:
$ time python3.6 mnist_cnn.py
469/469 [==============================] - 236s 504ms/step - loss: 0.3396 - acc: 0.8978 - val_loss: 0.0982 - val_acc: 0.9707
Test loss: 0.09823579416424036
Test accuracy: 0.9707
time command prints results out when execution completes.
- “Real” refers to elapsed wall-clock time, like a stopwatch.
- “User” refers to total CPU time in user-space. Surprisingly this number can be smaller than “real” time or greater. An app that sleeps a lot will have less CPU time than clock time, and an app the intensely utilizes multiple cores will have a higher CPU time than clock time.
- “Sys” refers to the total CPU time spent by the operating system in sys-calls.
For our case we’ll only care about “real” time. We can be impressed by parallelism if the “user” time is high, but it’s a means, not an end. With that in mind, let’s get to measuring.
Adding Workers Doesn’t Always Help
My first test for
workers yielded a slow-down, not a speed up. What gives?! Turns out that parallelism has a runtime cost. The MNIST dataset which is small and quick to read from disk is faster to access sequentially than the parallel ingestion.
6x Speedup Code Sample
Using the classic MNIST Keras training example with the following changes:
- We use a
keras.utils.Sequenceto manually feed batches to the
- We include a
time.sleep(0.5)to simulate latency on a network or preprocessing. Note this makes the baseline of 1 worker slower (1 worker takes 242 seconds instead of 143) and that doesn’t affect the validity of the speedup.
We get a 6x speed up from using 8 workers vs the default 1.
To test the workers feature, I had to modify the code multiple times, run it and write down the results in a spreadsheet so I can chart it. This is what it looks like:
There is another way. I’ve integrated MissingLink into the snippet and instead of working with a google sheet I get this dashboard:
So I don’t need to do any of the bookkeeping myself. I just launch experiments and come back to analyze results later. A team member can see exactly what I did without needing to ask me. To get this auto-tracking for my experiments – these are the four lines involved:
This makes experimenting much easier. I don’t have to work hard to remember, write down and communicate exactly what happened when. The extra mental cycles I get back add up fast.
Keras workers can make your experiments much faster with a snap. Auto-tracking experiments makes work much better recorded which is valuable to me, but becomes is an even bigger deal for teams. If you’re looking to try out workers or auto-tracking, make sure to: