In part 1, we reviewed Single Image Super Resolution (SISR) methods and Zero-Shot Super Resolution in particular. In part 2, we discussed how to run the code example with Keras (TensorFlow backend), OpenCV and a free account on MissingLink.ai. The applications for this technique range from medical imaging, working with compressed images, agriculture analysis, autonomous driving, satellite imagery, reconnaissance and more. Now, it’s time for us to go through the code, discuss how it works and highlight ways to modify it for future experiments.
By this point you should have already been able to run the project. If you have not set it up, please review part 2 or the ReadMe file. We’ll be highlighting specific blocks of code from the main.py file to better explain what is going on under the hood. As a quick recap, you’ll need to clone the project, install the dependencies, connect the MissingLink.ai account to the project and preview the experiment in real-time on the dashboard.
In part 3 we saw how to run the code and how to tweak the parameters to get different results in both quality and type.
In this part we will take a deep into the code implementation and explain the main functions and evaluation metrics. Let’s start!
Building the Model
We create the model using the Keras functional model API. This allows us to create more complex architectures than the common Sequential model. Let’s take a look at how this function works.
build_model() function is where we define and compile the model.
We defined the model parameters so that the input and output sizes will remain equal. We then create a skip connection from the input layer to the output layer. This can be thought of as a Residual block. Some works combine a few of these (with trickier skip connections) to create an even more complex model. Most of the parameters are configured as variables so the model can be easily changed in myriad ways.
PSNR and SSIM
PSNR and SSIM are common super-resolution and reconstruction (compression) metrics. We will use PSNR and SSIM for evaluation in case a ground truth image is available.
psnr_calc(img1, img2, scaling_fact)
PSNR — Peak Signal to Noise Ratio measures the quality of our reconstructed image versus the ground-truth image. The higher the outcome, the better. This is handled in the
According to Wikipedia, “structural information is the idea that the pixels have strong inter-dependencies, especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is a significant activity or ‘texture’ in the image.”
You can learn more about peak signal to noise ratio on Wikipedia here.
The metric_results function simply checks if we have a ground truth image for comparison before calling the metric calculation function.
predict_func()we get the super-resolution image by using the trained model on the original image. The main trick here is we first need to interpolate the original image to the target size, meaning, the original size times the sr factor. For example for a 200*100 image and an sr factor of 2, we will get an interpolated image of size 400*200. We use the
model.predict() functionality on this interpolated image and get a super-resolution result. The neural network in fact only improves the quality of the interpolated image.
Another method used in the paper is using the
model.predict() functionality on all eight possible permutations of the interpolated image (4 rotations and 2 flips) and then taking the median (per pixel) of the accumulated result. To do this we use the
An important point to take notice of is converting all the super-resolution outputs back to their original state, rotation and flip wise. We also use a few extra helper functions.
The next helper function allows us to add Gaussian Noise to the LR image before training. This helps in getting better results on low-quality images.
When adding a lot of noise, we can even achieve de-noising on low SNR images.
preprocess(image, scale_fact, scale_fact_int, i)
preprocess() function is where we control the different types of manipulations used on the HR father to create our augmentation pairs. It takes for inputs:
- image — the original image,
- scale_fact — the factor by which we downscale from image to LR,
- scale_fact_inter — the scaling factor by which we downscale from the HR fathers to the LR sons,
- i — an iteration index to help us keep track of the augmentation number.
Let’s take a look at the function:
As you can see, the first thing this function needs to do is create a
scale_down version of the original image by downsampling it by a random factor. We then use rotations and flips to create more variance in the data. From this manipulate HR father, we create an LR son by copying the father. We then blur son by downsampling it (using
SR_FACTOR) and then upsampling it (using
SR_FACTOR) back to the same size as HR father. This blurring mechanism emulates a difference in resolution by some factor SR which we choose pre-training.
These functions get us the path for the files we need for the different parts of the learning and testing processes. They work both for running with a local data set and with the MissingLink’s Data Management.
Super-resolution is still an open problem, while many great papers are publishing in the last few years with realistic results, creating super-resolved images or videos without adding unreal artifacts is still an issue.
Click the link below to request a demo: https://missinglink.ai/
For more content, please visit our blog: https://missinglink.ai/blog/