Deep Learning Frameworks

Building TensorFlow OCR Systems: Key Approaches and Tutorials

Text recognition capabilities have advanced as more organizations adopt deep learning and Convolutional Neural Networks (CNN) frameworks. You can easily adapt deep learning frameworks like TensorFlow to the special case of OCR by using object detection and recognition methods.

 

This article explains how to use TensorFlow to build OCR systems for handwritten text and number plate recognition using convolutional neural networks (CNN).

 

In this article you will learn:

What is OCR?

Optical Character Recognition (OCR) technology recognizes text inside images, such as scanned documents and photos. OCR is used to convert any kind of images containing written text (typed, handwritten or printed) into a digital format.

 

OCR software has improved over the past few years. Today OCR can recognize characters, words, and sentences with a very low error rate. However, there are still many non-traditional OCR use cases like partially written text, non-uniform font style, blurring due to camera motion, where existing generic solutions are limited in accuracy.

OCR Strategies

Text recognition is a two-step task. Firstly, you need to detect the text in the image; second, you identify the characters using three main approaches:

 

Computer Vision OCR Techniques

Optical character recognition is one of the earliest computer vision tasks. This computer vision approach applies filters to make the characters stand out from the background and then uses edge detection to help it recognize the individual characters. Finally, the characters are identified using image classification.

 

A Standard Deep Learning Approach

Deep learning detection approaches, such as SSD, YOLO and Mask RCNN are used to detect characters and words. Deep learning models can find it more challenging to recognize digits and letters than to identify objects such as dogs, cats or humans. They often don’t reach the desired accuracy, and therefore specialized approaches are needed.

 

Convolutional-Recurrent Neural Network

This approach uses a hybrid architecture that identifies words using three steps.

 

The first level uses a standard fully convolutional network. The last layer of the net is defined as a feature layer, and it is divided into “feature columns”. In the image below we can see how every feature column represents a section in the text.

 

convolutional neural network feature columns

 

The feature columns are fed into a deep-bidirectional LSTM which outputs a sequence. This sequence is used to identify the relationship between the characters.

 

deep-bidirectional LSTM

 

Finally, the third stage involves the transcription layer. Its goal is to take the character sequence, that includes redundant and blank characters, and to use a probabilistic method to unify and improve it.

TensorFlow OCR Tutorial #1 – Handwritten Text Recognition System

This tutorial presents a neural network which recognizes text in images. The network is made up of 5 CNN and 2 RNN layers and outputs a character-probability matrix. This matrix is either used for CTC loss calculation or for CTC decoding.

 

Architecture:

1. CNN layers

  • Convolutional layer with 5×5 filter kernels in the first 2 layers and 3×3 in the last 3 layers
  • Non-linear RELU function
  • Pooling layer
  • Output feature map of size 32×256

2. Recurrent neural network layers

  • Long Short-Term Memory (LSTM) implementation of RNN
  • Output matrix of size 32×80

3. Connectionist Temporal Classification (CTC)

4. Data

  • Input: A gray-value image of size 128×32
  • Output: Character-probability matrix

 

Process:

1. Create CNN layers and return an output of these layers

  • For each layer, create a convolution kernel of size k×k
  • Feed the convolution result into the RELU operation and to the pooling layer with size px×py and step-size sx×sy
  • Repeat the above steps for all layers

2. Create RNN layers and return an output of these layers

  • Create and stack two RNN layers with 256 units each
  • Create a bidirectional RNN from the stacked layers
  • Get 2 output sequences forward and backward of size 32×256

3. Train the Neural Network

  • Use the mean of the loss values of the batch
  • Feed the values into an optimizer such as RMSProp

 

See the source code in the tutorial by Harald Scheidl.

 

TensorFlow OCR Tutorial #2 – Number Plate Recognition

This tutorial presents how to build an automatic number plate recognition system using a single CNN and only 800 lines of code. The network architecture assumes exactly 7 characters are visible in the output and it works on specific number plate fonts.

 

Architecture:

  • Convolutional layer with 5×5 filter
  • Max pooling layer with 2×2 filter
  • Convolutional layer with with 5×5 filter
  • Max pooling layer with 1×2 filter
  • Convolutional layer with with 5×5 filter
  • Max pooling layer with 2×2 filter
  • 2 fully connected layers

tensorflow ocr tutorial

Source: Matt’s ramblings

 

 

Process:

  1. Extract ~3GB of background images from the SUN database
  2. Generate 1000 test set images with the size of 128×64
  3. Train the model
  4. Use a GPU  for training. It will take around 100,000 batches to converge
  5. Detect the number of plates in an image

 

Full source code by Matthew Earl

 

TensorFlow OCR at Scale with MissingLink

In this article, we explained how to run OCR experiments on TensorFlow. When you start working on TensorFlow OCR projects and running large numbers of experiments, you’ll run into some practical challenges:

 

tracking experiments

Tracking experiment progress—source code, and hyperparameters across multiple experiments. To find the optimal model you will have to run hundreds or thousands of experiments over time, and managing them will become a hassle.

running experiment across multiple machines

Running experiments across multiple machines—TensorFlow OCR experiments, especially with large datasets, will require machines with multiple GPUs, or in many cases scaling across multiple machines. Provisioning these machines and distributing the work between them is not a trivial task.

manage training datasets

Manage training data—TensorFlow OCR projects usually involve images, and training sets can get huge, up to Gigabytes or Petabytes of data. Moving data between training machines will take time and slow you down. Especially if you are trying to run multiple experiments.

MissingLink is a deep learning platform that can help you automate OCR experiments on TensorFlow, so you can concentrate on building winning text recognition experiments. Learn more to see how easy it is.

Learn More About Deep Learning Frameworks