• No results found

Image Forensics through Illumination Cues and Deep Learning

N/A
N/A
Protected

Academic year: 2023

Share "Image Forensics through Illumination Cues and Deep Learning"

Copied!
208
0
0

Loading.... (view fulltext now)

Full text

Figure (d) shows a fake image, while (e) and (f) are the DPHs of the two people, and the correlation between the two DPHs is 0.73. Figure (d) shows a falsified image of the same dataset, while (e) and (f) are the DPHs of the two people, and the correlation between the two DPHs is 0.39.

Figure 2.13a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6 Pair-wise distances between the faces present in the authentic image shown in
Figure 2.13a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6 Pair-wise distances between the faces present in the authentic image shown in

Image Forgeries

  • Splicing Forgery
  • Copy-Move Forgery
  • Retouching Forgery
  • Content-Removal Forgery

A famous example of fake retouching is shown in Figure 1.3a), where the photo of former American football star O. J. Simpson was darkened and the lighting was modified by Time magazine to give him a more menacing appearance. Yezhovm, who is seen standing to the left of Stalin in the authentic image shown in Figure 1.4a), is removed in the forged image shown in Figure 1.4b) and the missing pixels are filled from the surrounding pixels using image painting techniques.

Figure 1.2: An example of copy-move forgery [2], where a rocket is clonned to create the forged image.
Figure 1.2: An example of copy-move forgery [2], where a rocket is clonned to create the forged image.

Image Forensics

  • JPEG Compression-based Methods
  • Noise-based Methods
  • Lighting Environment-based Methods
  • Illumination Colour-based Methods
  • Resampling-based Methods
  • Camera sensor-based Methods
  • Deep learning-based data-driven methods

In [33], the misalignment of the 8×8 block grid is used to locate splicing and copy-move counterfeits. In [43], the resampled images are detected by computing the Radon transform [44] of the derivative of image pixels.

Image Anti-Forensics

Research Motivation and Problem Statement

Therefore, methods based on lighting environment and lighting colors are more reliable in detecting perfectly curated realistic-looking fusion forgeries. Therefore, there is scope for research in the lighting environment and image forensics based on lighting colors.

Research Contributions and Thesis Organization

Experimental results on two standard splicing data sets, DSO-1 and DSI-1, demonstrate the effectiveness of the proposed method. The experimental results show the effectiveness of the proposed method compared to the prior art.

Related Work and Research Gap

Low-dimensional lighting subspace

More importantly, they proved that the Lambertian reflection acts as a low-pass filter, and the first 9 SH coefficients are sufficient to capture 99% of the irradiance. Later, Ramamoorthi [71] provided a theoretical connection between the SH subspace and the eigensubspace through the analytic PCA construction, thus proving that the first 5–6 eigenvectors are sufficient to capture 98% of the illumination variations in the face.

Lighting Model

Theoretical Analysis of the Low-dimensional Lighting Model

First, an observation matrixQ, containing all the observations, is created by uniformly sampling the light source positions (αp, βp) and the surface normal coordinates (θi, φi) of the boundary imageBα,β. Because of this mixing of SHs, fewer eigenvectors will now capture most of the radiation.

Figure 2.1: Examples of images of a single subject under different light sources from (a) Ex- Ex-tended Yale B database [4] and (b) Multi-PIE [5] dataset.
Figure 2.1: Examples of images of a single subject under different light sources from (a) Ex- Ex-tended Yale B database [4] and (b) Multi-PIE [5] dataset.

Computation of Low-dimensional Lighting Model

When the image pixels correspond uniformly to the entire sphere of surface normals, the SHs will be orthogonal to each other. Thus, the eigenvectors will simply be the SHs themselves, and the first 9 eigenvectors will capture 99% of the irradiance.

Proposed method

Estimation of Lighting Environment

Using these Leigenfaces, we create the low-dimensional illumination model WL as shown in equation (2.16). A face imageFaf dimension M×N Output: LC vectorΩ. i) Retrieve the low-dimensional lighting modelWL from Algorithm 2.1. ii) ConvertF to a grayscale image of dimension M×And rearrange as an MN-dimensional vectorI. iii) Projection onto the L-dimensional subspace using equation (2.17) to obtain the LC vectorΩ.

Splicing Detection

If the two levels Fi and Fj come from two different LEs, the distance between Ωi and Ωj will be large. Therefore, the image will be considered split if the maximum distance between all the pairs is more than a predefined threshold.

Experimental Results and Analysis

  • Lighting Model Computation
  • Lighting Environment Estimation
  • Classification of consistent and inconsistent LEs
  • Performance on non-frontal face images
  • Performance in Splicing Detection

The ROC curves of the three methods on the Yale B and Multi-PIE datasets are shown in Figure 2.8a and 2.8b, respectively. We use the illumination model, created using the front pose face images (ie, the six separate faces shown in Figure 2.2a).

Figure 2.2 shows the first six eigenfaces computed from the image sets of two individuals.
Figure 2.2 shows the first six eigenfaces computed from the image sets of two individuals.

Discussions

On the other hand, all the pairwise distances in the authentic image (Figure 2.13(b)) are below the threshold value, as can be seen in Table 2.6. The maximum distance d for the authentic image was found to be 0.10, which is less than rth.

Summary

This method estimates the luminance color from different parts of an image using DRM [78]. Riess and Angelopoulou [79] proposed to create a new image, called luminance map (IM), using the luminance colors estimated from the input image.

Background

Dichromatic Reflection Model (DRM)

Similar to [9], also here, the features are classified in a pairwise manner by concatenating similar features of the two face IMs calculated from the same type of IMs, converted to the same color space. According to tone-neutral interface reflection[92], the spectral power density of the interface reflection is the same as that of the illumination source.

Figure 3.1: Surface and body reflections from a non-homogeneous surface according to the DRM.
Figure 3.1: Surface and body reflections from a non-homogeneous surface according to the DRM.

Dichromatic Plane Histogram (DPH)

Under uniform illumination, the dichromatic planes of two different colored surfaces in the same scene intersect at the illumination color. This is because the illumination color is common to both the dichromatic levels estimated from the two surfaces.

Proposed Method

To calculate the DPH from all the faces present in the image, as in Chapter 2, the faces are manually cropped. In the case of a bona fide image, the HDs of all faces will be similar since they come from the same lighting conditions.

Experimental Results

Analysis of Some Famous Forged Images

The DPHs calculated from the two individuals in the authentic image are almost equal, as shown in Figure 3-9. On the other hand, the DPHs calculated from the two persons in the falsified image are different from each other, as shown in Figure 3-10.

Figure 3.9: (a) An authentic image of Nelson Mandela (left) with Muhammad Ali (right), (b) DPH of Mandela’s face, and (c) DPH of Ali’s face
Figure 3.9: (a) An authentic image of Nelson Mandela (left) with Muhammad Ali (right), (b) DPH of Mandela’s face, and (c) DPH of Ali’s face

Summary

The method proposed in Chapter 3 extracts an illumination color-related histogram-based feature from the facial parts to detect the split forgery. Once trained, the CNN part of the siamese network is used to extract features from the face IMs.

Related Work and Research Gap

Illumination Colour Estimation

With the GGE hypothesis, the light color was estimated as the pth Minkowski norm of the derivative of the image pixels. The body reflection component contains the object color information, while the interface reflection contains the illumination color information.

Illumination Map (IM)-based Image Forensics

Recently, Pomariet al.[42] proposed a deep learning-based method that removes the need to extract handcrafted features from the IMs. Based on the above literature survey, the following open problem is identified: The existing IM-based methods either extract handcrafted features or deep features using a pre-trained network from the IMs.

Figure 4.2: A spliced image (a) and its corresponding IM (b).
Figure 4.2: A spliced image (a) and its corresponding IM (b).

Proposed Method

Overview of the Method

On the other hand, if they come from two different images, their IMs have different functions. We consider a pair of face IMs to be a genuine pair if they come from the same lighting source (i.e. the same image) and a matched pair if they come from two different lighting sources.

IM Computation and Face Extraction

Also, the automated methods often include non-face areas in the detected face frame. For these reasons, in the proposed method, we manually select the bounding boxes around the faces in the images.

Face-IM Pair Classification for Splicing Detection

  • Siamese Network for Feature Learning
  • Network Architecture
  • Feature Extraction
  • Splicing Detection using an SVM

The number of superpixels in face regions also decreases with reduced input size. Although the Siamese network learns to distinguish between authentic and artificially generated conjoined face pairs present in the training set, the difference in concatenated artificial face-IM pairs is more compared to conjoined face pairs present in images united in real life. .

Figure 4.4: The convolutional network architecture used in the siamese network.
Figure 4.4: The convolutional network architecture used in the siamese network.

Experiments and Results

Dataset

Training the siamese network

The network trained on GGE face IMs achieved a classification accuracy of 95.31% on the GGE face IM pairs of the test set. On the IIC face-IM pairs of the test set, the network trained on IIC face IMs achieved an accuracy of 97.44%.

Splicing Detection

  • Performance on DSO-1 dataset
  • Performance on DSI-1 dataset
  • Comparison to the state-of-the-arts
  • Robustness to JPEG compression

We then test the performance of features extracted from GGE-IM, IIC-IM, and raw face images using CNNs trained on IIC-IM. We have performed a series of experiments to see the robustness of the proposed method for JPEG compression.

Figure 4.6: ROC curves of the proposed method for DSO-1 and DSI-1 datasets.
Figure 4.6: ROC curves of the proposed method for DSO-1 and DSI-1 datasets.

Summary

Chenet al.[49] proposed the first DL-based method for image processing operation detection. The image editing software contains a large number of image editing tools such as Adobe Photoshop and GIMP.

Proposed Manipulation Detection Method

  • Siamese Convolutional Neural Network for Image Editing Operation
  • Network Architecture
    • CNN
    • Distance Layer
  • Learning
  • Manipulation detection using One-shot Classification

If the image is classified as manipulated, we check if it has been manipulated using one of the editing operations present in the test gallery. We want to check whether the editing in the gallery is applied to the test image patch or not.

Figure 5.1: Framework of the siamese network that takes a pair of input image patches and produce prediction p indicating whether the pair is SP or DP.
Figure 5.1: Framework of the siamese network that takes a pair of input image patches and produce prediction p indicating whether the pair is SP or DP.

Forgery Localization and Detection

Therefore, we first establish the class of the reference patch, i.e., whether it is authentic or counterfeit, and then detect counterfeit patches accordingly. After the reference patch class is set, counterfeit patches are detected according to the following two cases:

Experimental Results

Manipulation Detection Results

  • Discrimination of Different Image Editing Operations
  • Detection of different Image Editing Operations
  • Generalization to Unseen Manipulations
  • Dependence of Generalization Accuracy on Number of Train-
  • Detection of Unknown Manipulations

In the first experiment, we test the ability of the proposed method to detect/distinguish different image processing operations. This shows the generalization ability of the proposed method to unknown types of image editing operations.

Table 5.2: SP/DP pair classification accuracies achieved by the proposed and the FS methods when considering two manipulations at a time
Table 5.2: SP/DP pair classification accuracies achieved by the proposed and the FS methods when considering two manipulations at a time

Selection of Hyper-parameters of the Proposed CNN

The method achieves the average accuracy of 98.12% and 99.40% using the fixed SRM filters and the constrained convolution filters, respectively. In the case of dual tamper detection, the method achieves the average accuracy of 92.01% and 93.64% using the fixed SRM filters and the constrained convolution filters, respectively.

Forgery Localization and Detection Results

We have calculated F1 score and Matthew's correlation coefficient (MCC) [133] to quantitatively assess the counterfeit localization ability of the proposed method. We have also conducted an experiment to show the effectiveness of the proposed method in the image-level forgery task.

Figure 5.5: Performance of the proposed method in localizing forgeries in NIST-16 dataset.
Figure 5.5: Performance of the proposed method in localizing forgeries in NIST-16 dataset.

Summary

One stream of the network learns the high-level manipulation-related features, while the other streams TH. For example, Zhou et al.[16] showed the effectiveness of the fusion of the high-level and low-level features, learned from the training examples, in an R-CNN framework.

Proposed Method

  • Image-Stream Encoder-Decoder (ISED)
  • Noise-Stream Encoder-Decoder (NSED)
  • Feature Concatenation and Prediction Layer
  • Learning

We employ the ResNet [98] architecture instead of the VGGNet [127] architecture used in the encoder for SegNet. The low-level artifacts present in forged images, i.e. the inconsistencies in the noise level and traces related to different image processing operations are more affected by the image modification operation compared to high-level artifacts.

Figure 6.1: Block diagram of the proposed two-stream encoder-decoder network. The encoder in the image-stream learns high-level manipulation traces, such as artificial contrast
Figure 6.1: Block diagram of the proposed two-stream encoder-decoder network. The encoder in the image-stream learns high-level manipulation traces, such as artificial contrast

Experimental Results

Pre-training on Synthetic Dataset

In this work, we used the split images of the synthetic dataset created by Bappy et al. [6] from datasets from DRESDEN [129], COCO [153] and NIST16. We trained the proposed network on this synthetic dataset by using 90% for training and 10% for validation.

Fine-tuning and Evaluation on Standard Forgery Datasets

These results quantitatively demonstrate the superior performance of the proposed method over LSTM-EnDec on these datasets. We have also conducted an experiment to show the performance of the proposed method on pristine images.

Figure 6.2: Forgery localization results of the proposed method for splicing, copy-move, and removal forgeries present in NIST16 dataset
Figure 6.2: Forgery localization results of the proposed method for splicing, copy-move, and removal forgeries present in NIST16 dataset

Ablation Study

Finally, we have checked the performance of the network when training with the weighted cross entropy loss instead of the cube loss. The images in the last two columns are predictions of the proposed network using the weighted cross entropy and the cube losses on IFS data sets, respectively.

Figure 6.5: Prediction results on two pristine images from DSO-1 dataset. As can be seen, except for a few small regions, there is not much false positive in the predicted masks.
Figure 6.5: Prediction results on two pristine images from DSO-1 dataset. As can be seen, except for a few small regions, there is not much false positive in the predicted masks.

Summary

Rocha, “Exposing digital image forgeries by illumination color classification,” IEEE Transactions on Information Forensics and Security, vol. Farid, "Exposing Digital Forgeries in Complex Lighting Environments," IEEE Transactions on Information Forensics and Security, vol.

Figure

Figure 2.13a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.6 Pair-wise distances between the faces present in the authentic image shown in
Figure 1.2: An example of copy-move forgery [2], where a rocket is clonned to create the forged image.
Figure 1.3: An example of retouching forgery. The skin tone of the famous American football star and a murder convict O
Figure 2.1: Examples of images of a single subject under different light sources from (a) Ex- Ex-tended Yale B database [4] and (b) Multi-PIE [5] dataset.
+7

References

Related documents

The basic techniques commonly used in digital image processing include image enhancement, restoration, encoding and compression.. The first fruitful application was the American Jet