• No results found

Deep Learning-based Techniques for Image and Video Restoration

N/A
N/A
Protected

Academic year: 2023

Share "Deep Learning-based Techniques for Image and Video Restoration"

Copied!
206
0
0

Loading.... (view fulltext now)

Full text

In the second contributing chapter, transformed domain properties of the rain streaks in the image are exploited to reduce noise. I have complied with the norms and guidelines given in the Code of Ethical Conduct of the Institute.

Image Restoration

Given xn, the goal of the single-image dewatering problem is to estimate the dewatered image. Given the blurred image xn, the single-image de-blurring task aims to recreate the clean image yc.

Figure 1.2: Graphical demonstration of single image rain-streak removal problem.
Figure 1.2: Graphical demonstration of single image rain-streak removal problem.

Video Restoration

Applications

Outdoor surveillance and tracking: Improved visibility can also be useful for outdoor surveillance and tracking [29].

Literature Survey

Image De-Raining

  • Layer Separation Methods
  • Deep Learning-based Methods

To remove the rain streaks from the high-frequency part, the Canny edge detection [41] algorithm was used. 42] observed that the high-frequency component of the rain image consists of the major part of the rain streak part.

Figure 1.6: A qualitative demonstration of the synthetic images proposed in [3] to show the different rain-streak densities in the images.
Figure 1.6: A qualitative demonstration of the synthetic images proposed in [3] to show the different rain-streak densities in the images.

Image De-Hazing

  • Handcrafted Features-based Arts
  • Deep Learning-based Methods

61] proposed a method, namely AOD-Net, which does not predict the transmission and skylight maps separately. Instead, it generates the blurred image using a lightweight CNN, unifying transmission map and skylight estimation steps within a single unit known as the K-Estimation block.

Video De-Raining

  • Layer Separation Methods
  • Deep Learning-based Methods

Also, the proposed work could not generalize the extent and direction of rain tracks. Jiang et al.[75] took into account the intrinsic characteristics of rain tracks and used the method of alternating direction of the multiplier algorithm for an effective solution.

Motivation and Objectives

A graphical demonstration of excess color and white spot artifacts in the bare results of existing works. Propose an end-to-end model based on deep learning that uses the scale space information of the blurred image for deblurring a single image.

Figure 1.8: A graphical demonstration of the color degradation, halo and checkerboard artifacts in the de-hazed results of existing works.
Figure 1.8: A graphical demonstration of the color degradation, halo and checkerboard artifacts in the de-hazed results of existing works.

Contribution of the Thesis

Exploiting Efficient Spatial Upscaling for Single Image De-

Exploiting Transformed Domain Features for Single Image

In addition to the spatial domain features of the rain image, these subbands were used by the proposed method to generate the artifacts-free clean images.

A Probe Towards Scale-Space Invariant Conditional GAN

Organization of the Thesis

Summary

Based on the shortcomings of the existing literature, the objectives of the research are defined. In particular, first, it contains a brief introduction to some of the image transforms such as Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), and a filter called Laplacians or Gaussian (LoG).

Figure 2.1: Sample demonstration of the magnitude and phase spectrums in discrete Fourier domain of in image.
Figure 2.1: Sample demonstration of the magnitude and phase spectrums in discrete Fourier domain of in image.

Discrete Haar-Wavelet Transformation

HL/LH: The lower left and upper right subbands are estimated along the height and width using low-pass and high-pass filters, alternatively. HH: The lower right sub-band can be estimated analogously to the upper left quadrant but by using the high-pass filter, which belongs to the given wavelet.

Laplacians of Gaussian

Convolutional Neural Networks

  • VGG-16
  • ResNet
  • U-Net
  • Generative Adversarial Networks
  • Perceptual Loss

The pooling layer reduces the spatial dimension of the feature representation to avoid overfitting during training. Being the runner-up of the ILSVRC 2014 challenge, the architecture of VGG-16 is shown in Figure 2.5.

Figure 2.4: An overview of the architecture of a generic deep CNN
Figure 2.4: An overview of the architecture of a generic deep CNN

Image Quality Metrics

Full-reference Metrics

These metrics are calculated between the original clean ground truth images and the corresponding generated noise-free images. While the second considers the mutual information between the input of the distortion block and the output of the visual block.

Figure 2.9: MS-SSIM evaluation system. L: low pass filtering and 2 ↓ : downsample by factor of 2.
Figure 2.9: MS-SSIM evaluation system. L: low pass filtering and 2 ↓ : downsample by factor of 2.

NIQE: Naturalness Image Quality Evaluator (NIQE) [123] compares the restored image with a standard model calculated from statistics of the images of natural scenes. PIQE: Perception Based Image Quality Evaluator (PIQE) [124] estimates the perception-based quality of the image using blockwise distortion analysis.

Datasets

The proposed model was tested on synthetic data set (SOTS) provided by Li et al. 129] consisting of 500 outdoor and indoor images in addition to the benchmark test set1 provided by Fattal et al.[17] and actual blurred images.

Summary

Perhaps this is due to improper use of deconvolution layers during the reconstruction of desiccated images in network engineering. A deep residual network [6] was used to further improve the quality of the derained image produced by the encoder-decoder network.

Figure 3.1: Schematic diagram of sub-pixel upscaling.
Figure 3.1: Schematic diagram of sub-pixel upscaling.

Proposed Approach

  • Baseline Generator Model
  • Generator with Efficient Sub-Pixel Convolution
  • Discriminator
  • Cost Function

An overview of the architecture of the proposed discriminator model, which consists of 5 convolutional layers, is shown in Figure 3.4. The goal of the proposed method is to generate a rain-free image that has a rain image as input.

Figure 3.3: An overview of single decoder unit (D-Unit) which cinsists of two convo- convo-lution layers followed by an efficient sub-pixel re-arrangement(S.P.C) block
Figure 3.3: An overview of single decoder unit (D-Unit) which cinsists of two convo- convo-lution layers followed by an efficient sub-pixel re-arrangement(S.P.C) block

Experiments and Results

  • Quality Measures
  • Model Parameters
  • Comparison configurations
  • Quantitative Results
  • Qualitative Results

However, the corrupted images generated by the proposed scheme do not suffer from these artifacts and degradations. The visual quality of rain-removed images can be improved by including adversarial training in the proposed model, unlike [45] and [16].

Figure 3.6: Qualitative comparison with ID-CGAN [9] on synthesized test images in terms of SSIM/PSNR.
Figure 3.6: Qualitative comparison with ID-CGAN [9] on synthesized test images in terms of SSIM/PSNR.

Discussion

To avoid these issues, unlike existing methods, this work has incorporated a subpixel convolution [130] to improve the spatial resolution of the image. This may be because the efficient subpixel inversion may have improved the spatial resolution of the rain-removed images by the proposed architecture, which in turn resulted in better PSNR values ​​and a slight improvement in SSIM and performance metrics. other assessment.

Summary

Rain Streaks in DFT

When we move the DC component to the middle, as shown in Figures 4.1j, 4.1c, the difference is slightly more visible. However, there may be some high-intensity points in the magnitude spectrum that may contain some information about the rain, but are not very sensitive to the human eye.

Figure 4.1: A flow of visualizations. (a) Clean image, (b) Unscaled magnitude image of (a) with unshifted DC component, (c) Unscaled magnitude image of (a) with DC component shifted to center, (d) Scaled magnitude image of (a) with unshifted DC  com-ponent
Figure 4.1: A flow of visualizations. (a) Clean image, (b) Unscaled magnitude image of (a) with unshifted DC component, (c) Unscaled magnitude image of (a) with DC component shifted to center, (d) Scaled magnitude image of (a) with unshifted DC com-ponent

Fourier Domain Input to Deep CNNs

As shown in Figure 4.2(Q), let θRc denote the rain line direction in the spatial domain and θRf in the S∗ space. When we convert the RGB rain image to YCbCr color space, it is observed that most of the rain line information exists only in the Y channel.

Figure 4.3: The CNN framework of the proposed single-image rain streak removal.
Figure 4.3: The CNN framework of the proposed single-image rain streak removal.

Noise residual in Fourier domain

The real, imaginary coefficients are subtracted pixelwise from the real, imaginary coefficients of rain image to get the real, imaginary coefficients of non-rain image. The rain streaks free image in spatial domain can be reconstructed using inverse DFTop calculated real and imaginary parts.

Figure 4.5: Reconstruction of images using different phase and magnitude spectums in terms of SSIM and PSNR
Figure 4.5: Reconstruction of images using different phase and magnitude spectums in terms of SSIM and PSNR

Proposed Networks

D-Net

N-Net

Loss functions

Results

We only randomly selected our training data from the dataset available by Fu et al. It can be observed that the proposed DFT-based rainstreak removal approach achieves a comparable result to state-of-the-art approaches on the test datasets.

Table 4.1: Quantitative results evaluated in terms of average SSIM [15] and PSNR (dB) on the test datasets
Table 4.1: Quantitative results evaluated in terms of average SSIM [15] and PSNR (dB) on the test datasets

Discussion

Normalization of input data

However, a visual improvement along with the reduction of rain bands can be observed in the rain removed image compared to the original rain image. The loss in the reconstructed image is due to the normalization techniques and also due to the non-linearity described in subsection 4.4.3.

Figure 4.7: Qualitative results on real-world rainy images. Top row shows rainy im- im-ages, whereas Bottom row shows our results
Figure 4.7: Qualitative results on real-world rainy images. Top row shows rainy im- im-ages, whereas Bottom row shows our results

Single layer Vs. Multilayer

Non-linearity

Therefore, it can be concluded that the use of activation function in convolutional neural networks can remove some frequencies which are useful in image reconstruction back to the spatial domain. To summarize, it can be concluded that although a small improvement is achieved using DFTdomain, one may need a different transform domain which can store relatively more information than DFTdomain.

Figure 4.9: Qualitative results on TD-Zhang et al. [3] dataset using the model D-Net + N-Net
Figure 4.9: Qualitative results on TD-Zhang et al. [3] dataset using the model D-Net + N-Net

Image De-Raining in Correlated Transformed Domain

Proposed Scheme

Generator Network (G)

Each of the P2, P3 and P4 units consists of a proposed subnetwork called F-Net which consists of four convolution layers where each layer has filters of size 3 × 3, spatial step size of 1 × 1 with a number of filters on each layer . 4,8,6 and 1 respectively. Therefore, these intermediate rain maps are merged and fed into a 10-layer ResNet to further refine and output a merged rain map referred to as Rmerged.

Discriminator Network (D)

The purpose of units P2, P3 and P4 is to use the cues in wavelet subbands LH, HL and HH that are more suitable for generating the rain maps and outputting the intermediate rain maps denoted Rh, Rvan and R respectively Clean image candidate C2 is obtained by pixel-wise subtracting Rmerged from INY and the intermediate rain maps Rh, Rvan, and Rd are pixel-wise subtracted from INY to get the clean image candidates C3, C4, and C5, respectively.

Cost Function

The MSE is used in the majority of the denoising algorithms and can be defined as LE = 1. Therefore, the perceptual loss function [8] is used to avoid these artifacts by preserving the contextual and high-level features of the image.

Table 4.5: Quantitative results compared with recent methods on synthesized test im- im-ages
Table 4.5: Quantitative results compared with recent methods on synthesized test im- im-ages

Experiments and Results

Performance Evaluation

It is trained on LG with similar cost weights as in the case of the proposed method. The proposed method has shown an impressive improvement over recent methods and basic configurations in

Summary

Loss function

The conventional per-pixel loss (LE) between dehazed and ground truth (C) images can be written as. Furthermore, the final objective function of the proposed model for single image demisting task can be defined as.

Experiments and results

  • Datasets and training details
  • Evaluation metrics
  • Ablation study
  • Comparison with State-of-the-Art Methods

The runtime comparison of the proposed schedule with existing methods is shown in Table 5.6. However, the perceptual quality of the defocused image restored by using the proposed scheme is better than the same by using the existing methods.

Table 5.1: Quantitative comparison on the SOTS (Outdoor) dataset. Best and second best results are shown in blue and red colors respectively
Table 5.1: Quantitative comparison on the SOTS (Outdoor) dataset. Best and second best results are shown in blue and red colors respectively

Summary

Extending GAN for Video De-Raining

The proposed method is built on the conditional GAN ​​framework [165], which consists of two main sub-modules, namely (a) Generator and (b) Discriminator, as ϕG, ϕD respectively in our case. As described above, the proposed model consists of two main sub-modules, the Generator (ϕG) and the Discriminator (ϕD).

Network Architecture

The proposedϕD takes either three estimated consecutive unrained frames (ˆfc,i−2m ⊙ˆfc,i−1m ⊙ˆfc,im) or three. Then the output (Y) of the first MCB block can be written in the proposed ϕD, with ⊕ as summation, as.

Figure 6.4: An overview of the architecture of the proposed generator model φ G for video rain-streak removal.
Figure 6.4: An overview of the architecture of the proposed generator model φ G for video rain-streak removal.

Cost Function

The goal of 3D convolution in the proposed ϕD is to learn the temporal consistency between the successive three frames i−2, i−1, i. Using only LM SE loss to optimize the proposed model may result in the loss of high-frequency detail from the frames while removing the rain streaks.

Experiments & Training Details

Dataset

Training Parameters

Evaluation Metrics

117], Universal Image Quality Index (UQI) [115], Multiscale Structural Similarity Index (MS-SSIM) [114], Natural Image Quality Evaluation (NIQE) [123], Image Quality Evaluation based on perception (PIQE) [124], Feature Similarity Index (FSIM) [119], Haar Wavelet-Based Perceptual Similarity Index (Haar PSI), Gradient Magnitude Similarity Deviation (GMSD) [ 122], Blind/No Reference Spatial Imaging (BRISQUE) [126], and Total Variation Error (TV). For fair comparison, we have set a figure of merit (fom) based on performances on all approved test sets as, fom = {0.6 * No.

Table 6.1: Image quality metrics behavior.
Table 6.1: Image quality metrics behavior.

Results

Baseline Configurations

Quantitative Results

The quantitative comparison of the proposed scheme with existing methods on test set a1 is shown in Table. The quantitative comparison of the proposed model with existing schemes on the test sets b1,b2,b3 and b4 is shown in the Tables.

Table 6.5: Quantitative comparison of the proposed model with existing schemes using the incorporated evaluation metrics on the a 1 test set
Table 6.5: Quantitative comparison of the proposed model with existing schemes using the incorporated evaluation metrics on the a 1 test set

Qualitative Results

TCL [4] SPAC-CNN [14] SE [12] JORDER [51] FastDerain [11] Proposed Figure 6.9: Qualitative comparison of the proposed model with existing schemes on synthetic rain video frames. Figure 6.7 shows a visual comparison of the proposed model with existing schemes on a synthetic rain video.

Table 6.14: Run-time comparison of the proposed model with existing schemes over the Test Set Light.
Table 6.14: Run-time comparison of the proposed model with existing schemes over the Test Set Light.

Ablation Study

Quantitative Results

Qualitative comparison

  • Improvement from the perspective of input color-
  • Improvement from the perspective of model ar-
  • Exponential Perceptual Loss + MSE vs MSE
  • Exponential Adversarial Loss vs Fixed Constant
  • MSE-based Adversarial Loss vs Entropy-based Ad-

It can be observed from the Figure 6.13 that the results obtained using only MSE loss still contain the visible rain streaks compared to the proposed baseline. It can be observed from Figure 6.17 that the results obtained using G-M-EP-EA-N contain visible rain streaks and reconstruction errors compared to the model trained with the proposed loss function.

Figure 6.12: To show the comparison between Temporal and G-M-EP-EA-D configs.
Figure 6.12: To show the comparison between Temporal and G-M-EP-EA-D configs.

Justification

Kim, “Deep piramidale residuele netwerken,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. Algorithms and benchmark,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, blz.

Figure 6.16: Sample results to show the comparison between Temporal and G-M-FP- G-M-FP-FA configurations
Figure 6.16: Sample results to show the comparison between Temporal and G-M-FP- G-M-FP-FA configurations

Figure

Figure 1.4: A qualitative demonstration of the task of video de-raining.
Figure 1.5: Object detection results on real-world examples of single image de-hazing.
Figure 1.6: A qualitative demonstration of the synthetic images proposed in [3] to show the different rain-streak densities in the images.
Figure 1.8: A graphical demonstration of the color degradation, halo and checkerboard artifacts in the de-hazed results of existing works.
+7

References

Related documents

4.4 SciTail Although SNLI has proved to be beneficial for the advancement of techniques for recognizing textual entailment and has provided researchers to run many deep neural models