Figure 6.15: Sample results to show the comparison between Temporal and G-M- EP-FA configurations. V# denote the video number. Please magnify the figure to see the visible rain-streaks in G-M-EP-FA.
should be retained to make the de-rained image look realistic.
Figure 6.16: Sample results to show the comparison between Temporal andG-M-FP- FA configurations. V# denote the video number. Quantitative results are given in Tables 6.15, 6.16, 6.17, and 6.18 of this chapter.
image/video de-raining. This may help in avoiding the color distortions in the de-rained frames. (b) A majority of the video noise removal methods that are based on the deep learning framework separately consider the objectives of spatial and temporal enhancement. However, in this work, we attempt to unified these objectives and entirely rely on the proposed model for inherently estimating the optical flow followed by the de-rained frames. We thus present a light-weight deep CNN for video de-raining which is not only a resource favoured, but also overcome the heavy motion blur due to rapid change in motion between the frames because of inherently estimating the optical flow and its frame-recurrent nature. (c) While the encoder-decoder model might have improved the spatial resolution of the de-rained frame, the incorporated frame-recurrent methodology and temporal loss from the adversary may have further enhanced the performance of the proposed model by eliminating the problem of imprints from the previous frames and object disappearance. This may be due to the good choice of temporal
Figure 6.17: To show the comparison between Temporal andG-M-EP-EA-Nconfigs.
V# denote the video no.
width in the input, which is 3 in our case. However, the impact of increasing or decreasing the temporal width on the performance of the network may be taken as a future scope of this work.
6.6 Summary
In this contributory chapter, we have presented a light-weight unified deep learning- based frame-recurrent method for the video rain-streak removal task, which is built upon the Conditional GAN framework. The proposed generator method takes a previously estimated de-rained frame and rain-streak map to predict the current rain-free frame from a rainy video. Whereas the adversary is a multi- contextual 3D convolution-based CNN that classifies the set of de-rained frames into real or fake. In addition to the traditional L2 loss, we have also adopted the perceptual cost function for the optimization of the proposed model. Instead of traditional entropy loss from the adversary, we attempt to use the Euclidean
distance between the feature maps returned by the adversary to optimize the generator model for the video de-raining. To prove the efficacy of the proposed method, we have given an extensive comparison with ten state-of-the-art meth- ods for video and image de-raining using fourteen image quality metrics on eleven test-sets. We have also shown the applicability of the proposed model on real- world rainy videos. In terms of computation, we have observed that the proposed model takes a minimal amount of time, which is ∼1.5 seconds per frame, for estimating the rain-free videos when compared to other existing methods.
The next chapter concludes the thesis by briefly summarizing the work pre- sented in the thesis and discussing the future research works.
Chapter 7
Conclusion and Future Works
The main objective of this dissertation is to propose image and video restora- tion algorithms to obtain noise-free images and videos without compromising the visual quality. Two major tasks have been achieved in this research work:
firstly, analyzing the noise characteristics in a noisy image or video, and secondly, devise deeper models to remove such noise based on the noise characteristics.
In this chapter, we have summarized the major contributions of this thesis and highlighted some future scope of the research.
7.1 Summary of the Contributions
In next subsection, we have presented the summary of contributions.
7.1.1 Exploiting Efficient Spatial Upscaling for Single Im- age De-Raining
In the first contributory chapter, a learning-based approach has been presented to avoid over-coloring and white-dot artifacts in the de-rained images, which is em- powered with efficient sub-pixel upscaling and adversarial training. The proposed approach utilizes the luminance channel of the rainy images only to bypass the visual artifacts due to the correlated RGB domain. It has been shown that the usage of efficient sub-pixel upscaling is beneficial over traditional deconvolution in the case of single image de-raining.
7.1.2 Exploiting Transformed Domain Features for Single Image De-Raining
The second contribution introduces the transformed domain coefficients of the rain-streaks in deep learning. In the first part of the second contribution, an uncorrelated transformed domain has been exploited by processing the DFT co- efficients using a deep CNN. The proposed approach takes DFT coefficients of the rainy image as input and outputs the same of the de-rained image. Whereas, in the second part of the second contributory chapter, a correlated transformed domain has been exploited in terms of DWT coefficients for the same task. It has been shown that a significant improvement can be achieved if correlated trans- formed domain cues are given as input to deep CNN in addition to the spatial domain features.
7.1.3 A Probe Towards Scale-Space Invariant Conditional GAN for Image De-Hazing
The third contribution uncovers the aspect of scale-space invariance in the deep CNN for single image de-hazing by utilizing the LoG of the images. The LoG preserves a variety of edgy structures which can be utilized to remove the halo artifacts in the de-hazed images. The proposed model incorporates the Euclidean difference between the LoG features of de-hazed and clean ground truth images as a supervised cost function to optimize the conditional GAN-based framework.
7.1.4 Frame-Recurrent Multi-Contextual Adversarial Net- work for Video De-Raining
Lastly, in the final contribution, a unified multi-contextual deep CNN has been proposed for the task of video de-raining. It has been experimentally shown that the proposed multi-contextual 3D convolution-based design has been highly beneficial for efficient video de-raining. The method is further empowered with adversarial and perceptual cost functions.
7.2 Future works
The present study of this dissertation can be extended further in several directions as listed below:
• The proposed works in chapters 3, 4, and 5 can be extended to the re- spective video restoration. Particularly, it may be interesting to see how learning-based methods perform when presented with transformed domain coefficients of temporally connected noisy frames in the case of video de- noising.
• The proposed work in chapter 5 can be re-engineered to accommodate the scale-space invariance in the respective architecture instead of utilizing it as a supervised cost function.
• The presented approach in the last contributory chapter can be further extended to solve other video restoration tasks, such as video de-snowing and inpainting.
• Also, one may extend the presented ideas to image or video de-noising in a completely different domain, such as underwater or satellite optical image and video restoration using deep learning techniques.
