In this chapter, background concepts on the convolutional neural network (CNN) have been presented. These concepts are used to design learning-based image and video restoration frameworks, which are presented in the later chapters of
this thesis. In addition, evaluation metrics used to evaluate the methods and the datasets are presented in this chapter.
With this background, this thesis’s first contribution will be discussed in the next chapter, where the task of single image de-raining will be addressed in the spatial domain.
Exploiting Efficient Spatial Upscaling for Single Image De-Raining
It has been observed in Section 1.3 that low-level vision tasks, e.g. image, and video de-noising, can be beneficial for high-level tasks, especially in bad weather conditions. There can be several possible reasons for bad weather conditions, including heavy rainfall, haze or fog, and snowfall, etc. The efficiency of the outdoor vision tasks, such as autonomous vehicle navigation system, depend on the visual quality of the image and videos, which can be severely degraded by the rainfall or haze in bad weather conditions. Therefore, it becomes necessary to propose efficient and real-time friendly learning-based methods for improving the visual quality of degraded images and videos in bad weather scenarios. In this dissertation, firstly, the task of rain-streak removal in the images has been studied.
Rain-streaks exhibit pseudo-periodic additive nature in an image (Eq. (1.1)).
It has been observed from the limitations of existing works mentioned in Chapter 1 that a majority of existing methods suffer from the problem of over-coloring and white-dots artifacts in the de-rained images (see Figure1.7). In other words, most best-published works fail to reconstruct the original perceptual quality of the clean image. It may be due to the improper usage of the deconvolution layers during the reconstruction of the de-rained images in network engineering. The
deconvolution layer with stride > 1 may induce blocky visual artifacts. Further- more, the processing of rainy images using deepCNNs in highly correlated color space, such as RGB, may not be much beneficial. It may be one of the rea- sons why the de-rained images of existing works that operate inRGBcolor-space suffer from the over-coloring problem. Also, the high-level features from deep CNNinherently capture the white round particles, so the perceptual loss  may enhance the white-dot artifacts in the de-rained images.
These limitations motivate us to enhance the perceptual quality of the rain- free image and, therefore the contributions made in this chapter can be sum- marised as follows:
• U-Net  based architecture has been very successful in the case of image de- noising, and reconstruction tasks , due to its ability to preserve important features for the reconstruction of images and discard the irrelevant and noisy components. Therefore, an image de-raining model, namely HRID-GAN, has been proposed based on the U-Net framework.
• It has also been proposed to use the efficient sub-pixel convolution 
instead of conventional deconvolution layers to avoid the blocky visual ar- tifacts in the generated de-rained images.
• To further improve the quality of the de-rained image generated by the encoder-decoder network, a deep residual network  has been used.
• The cGAN based adversarial training has been incorporated in order to achieve better de-rained images.
• An ablation study has been given at the end of this chapter to demonstrate the effects of certain modules in the network with detailed comparisons.
3.1 Sub-pixel Convolution
During the downsampling of a noisy image, the most prominent features remain in the compressed form, whereas the high-frequency details such as noise are
LR Kernels Features
Figure 3.1: Schematic diagram of sub-pixel upscaling.
discarded. To reconstruct the de-noised image, the most common method used is transposed convolution, which is also popularly known as deconvolution .
Even though bicubic interpolation is a special case of deconvolution , it has been observed that transpose convolution (stride>1) often induces blocky visual artifacts in the generated images .
The other way to upscale an image is to perform convolution operation with a fractional stride of 1r and then perform the pixel-shuffle operation PS based on the following equation .
PS(T)x,y,c =T⌊x/r⌋,⌊y/r⌋,c.r.mod(y,r)+c.mod(x,r) (3.1)
It can be explained in detail using a schematic diagram presented in Figure 3.1, where the input Low Resolution (LR) features are upscaled to a High Resolution (HR) feature map without utilizing stride > 1. The LR features of shape H × W ×C are upscaled to HR features of shape H.r×W.r×C, where r denotes the upscaling factor. For this, first, r2 number of different convolution filters are used to generate the r2 feature maps. Later, these generatedr2 feature maps are periodically shuffled using PS operation to get the desired HR features. In this way, the drawback associated with deconvolution can be avoided when stride>1.
In this work, instead of deconvolution, we have utilized the sub-pixel upscaling to generate artifacts-free de-rained images.