Neural Network Concepts Cover

Computer Vision

Super-Resolution Deep Learning: Making the Future Clearer

The image remastering approach known as Super-Resolution (SR) has numerous applications in many fields such as medical imaging in healthcare, autonomous driving systems and analyzing security footage.

SR technologies have been rapidly growing since the introduction of Deep Learning (DL) techniques like Convolutional Neural Networks (CNNs)  and Generative Adversarial Networks (GANs). New approaches to SR, based on deep learning techniques, have significantly increased the level of detail in high-resolution images they generate compared to earlier techniques.

For more in-depth information about the various aspects of super-resolution, see our blog series.

Comparison of ZSSR performance VS. EDSR

 

What Is Super-Resolution?

Super-Resolution refers to a class of techniques designed to create a high-resolution image from a low-resolution image. The main task of Super-Resolution is to increase the size of an image with the lowest possible reduction to quality. To enable this upscaling process, an algorithm fills in the missing details to create a larger output image.

Some of the challenges in this process include:

  • The extra details needed to create the high-resolution output are produced based on an estimation of what the low-resolution input will look like in the high-resolution output, but these details are unknown.
  • Since we are filling in missing details, we can create multiple high-resolution outputs based on the same low-resolution input.

Deep Learning Super-Resolution Methods

We can use deep learning convolutional neural networks to train a model for super-resolution. To do so, we need to overcome the fact that the convolution operation reduces the size of inputs. Thus, we need to use a deconvolution or a similar layer to multiply the input size by a certain factor to achieve the result we desire.

To train such a model for SR, we need to download high-resolution images off the Internet and scale them down to a low resolution. We can then feed these low-resolution inputs to the network and train it to create a high-resolution version of the images, which we can compare to the original high-resolution versions.

We can measure the success of the network by how well it reduces the Mean-Squared Error (MSE) between the pixels of the output and the original version.

The MSE equation is:

f= original image matrix

g= matrix of the generated (output) high resolution image

M= number of pixel rows

i=  index of the row

N= number of pixel columns

j= index of the column

Theoretically, the best result is an MSE of 0. In that case, the original high-resolution image and the high-resolution version generated by the network are identical.

Peak Signal-to-Noise Ratio (PSNR)

This metric is used to determine the quality of the generated image. We can measure PSNR on a logarithmic scale by comparing the highest pixel value (peak signal), which is equal to the MSE, with the pixel values of the generated image. The PSNR equation is:

maxvaule= the maximum value existing in the input image.

The purpose of this equation is to calculate the trade-off between MSE and the maximum pixel value. Higher PSNR equals higher quality generated images.

However, since we are trying to mimic real-world scenarios, high PSNR doesn’t guarantee the best result, as it can result in overly smooth images that may look unreal. To make images more perceptually pleasing, we can use various CNN architectures , such as ResNet and GoogleNet, as feature extractors for content loss, which will be used as the loss function of the network. Content loss is defined as the difference between the representation (feature map) of the original (ground-truth) image and the generated image.

The Perception-Distortion Tradeoff


Generative Adversarial Networks

This is the most commonly used deep learning architecture for training deep learning-based SR models. The architecture of GAN is based on unsupervised learning. A GAN is comprised of two neural networks residing in a single framework and competing in a zero-sum game.

example of GAN process

GANs are capable of artificially generating various artifacts such as audio, video, and images that mimic their human-made counterparts. The goal of this process is to take a simple input and use it to generate a complex output with a high level of accuracy.

Learn more about Generative Adversarial Networks.


Super-Resolution with Generative Adversarial Networks

Nowadays, most SISR techniques are quick and accurate. However, most of them don’t fare so well when it comes to recovering fine-grained texture and details from low-resolution images to generate a high-resolution image without distortion. This is mainly due to the fact that most work, up to this point, focused on minimizing the MSE, which is equivalent to achieving high PSNR and resulting in overall good image quality.

SISR

While the images generated by SISR are of higher resolution, they are often blurry, lack high-frequency, fine-grained details and look dull in comparison to true high-resolution images.

To build a model capable of creating more perceptually satisfying and less blurry images, we need to use a model that can capture the perceptual differences between the original image and the generated one.

To achieve this, we can use Super-Resolution with Generative Adversarial Networks (SRGAN), which produces high-resolution images by applying a combination of an adversarial network and a deep network.

The steps to train an SRGAN are as follows:

  • Take a set of high-resolution images and down-scale them to low resolution.
  • Input the images into the generator and let it produce an output of SR images.
  • Use the discriminator to distinguish between the original high-resolution and SR images, and then use back-propagation to train the generator and the discriminator
  • Repeat stages until you reach satisfactory results

Running and Scaling Up GAN with MissingLink

Running GAN models typically involves intensive and complex tasks, such as training an SRGAN model. Training and running these models requires powerful hardware to run efficiently and usually takes many hours or even days.

You can run these models on systems comprised of multiple Graphics Processing Units (GPUs), which are often used to help run DL models more efficiently. However, this solution is highly expensive. Another solution is to run these models in cloud-based systems, but then you would need to constantly keep track of your experiments to avoid wasting time and money.

To run GAN models with high efficiency and avoiding unnecessary expenditures, you can use a deep learning platform like MissingLink, which offers a solution for deep learning models. With the MissingLink platform, you can run GAN and other deep learning experiments across multiple machines.

MissingLink lets you automate, track and record your experiments, so you can scale your GAN model in minutes and let MissingLink run experiments simultaneously to avoid idle time.

Request a demo and see how easy it is to train GAN models with MissingLink.

Train Deep Learning Models 20X Faster

Let us show you how you can:

  • Run experiments across hundreds of machines
  • Easily collaborate with your team on experiments
  • Reproduce experiments with one click
  • Save time and immediately understand what works and what doesn’t

MissingLink is the most comprehensive deep learning platform to manage experiments, data, and resources more frequently, at scale and with greater confidence.

Request your personal demo to start training models faster

    Thank you!
    We will be in touch with more information in one business day.
    In the meantime, why not check out how Nanit is using MissingLink to streamline deep learning training and accelerate time to Market.