• No results found

3.3 Heartbeat Synthesis using Generative Models

3.3.2 Proposed Deep Convolution Conditional GAN

The heartbeat generation framework generates three classes of beats using a convolu- tion based conditional GAN. GANs consist of a generative (probabilistic) model that learns the input probability distribution and a discriminative model that discrimi- nates between the real and generated samples. Generator G(z) maps latent input to the probability distribution of real data Pr. Here, z represents the latent input.

Discriminator D(x) classifies the input sample as real or fake, here x represents the input beats provided to the discriminator. Mathematically, GAN training resembles a two-player minimax game, where the generated data distribution is brought as close as possible to the real data distribution, and the discriminator aims to be better at differentiating between the real and generated beats. GAN optimisation is performed TH-2764_156201001


using the loss function defined in Equation 3.13. Here, z is randomly sampled from synthetically generated data distribution (Pz) and x is randomly sampled from real data distribution (Pr). The former term in Equation 3.13, log[D(x)] predicts that the data is real and the latter term log[1−D(G(z))] predicts that synthetically gener- ated data is fake. The generator and discriminator training alternates by keeping the training of other model constant and alternately maximizing and minimizingV(G, D) until synthetically generated data is indistinguishable from real data.


G max

D V(D, G) =Ex∼Pr(x)[logD(x)] +Ez∼Pz(z)[log(1−D(G(z)))] (3.13) Broadly categorising, GANs can be classified into: Vanilla GAN [90], Conditional GAN [222], and Deep Convolutional GAN (DCGAN) [223]. Vanilla GAN is the basic GAN built using the multi-layer perceptron (MLP) that optimizes the Equation 3.13 using stochastic gradient descent. The Conditional GAN encodes conditional depen- dencies in generator and discriminator along with the conventional input allowing for the generation of specific beats. Deep Convolutional GAN (DCGAN) is the most popular and successful GAN that replaces traditional MLP with convolution layers (with strides) without max-pooling layers, making GAN training fast and stable.

Conditional GAN: A class encoded convolution GAN is used to generate class- specific heartbeats (N, SVEB, and VEB). The class information (c) is incorporated with conventional input transformingG(z) toG(z|c) andD(x) toD(x|c) leads to the deep convolution conditional generative adversarial network (DCCGAN). Figure 3.18 illustrates the architecture of the proposed DCCGAN. The architecture details of the generator and discriminator are illustrated in Figure 3.19.

Modified DCCGAN Loss Function: The proposed DCCGAN takes two inputs, a class label (c) and the real beats corresponding to that class (x|c) or the generated beats corresponding to that class (G(z|c)). This modifies the basic GAN loss function as described in Equation 3.13 to DCCGAN loss function in Equation 3.14. Here,Pr(x) andPz(z) are the real and generated beat distributions, respectively.

minG max

D V(D, G) = Ex∼Pr(x)[logD(x|c)] +Ez∼Pz(z)[log(1−D(G(z|c)))] (3.14) Generator Model: The generator model intakes a latent input vector (z) and class label (c) and generates a fake heartbeat (G(zn)), where z ={zn}Nn=1, N is the length of noise vector,G(zn)∈Rk, and k is similar to the dimension of a real heart-

66 TH-2764_156201001


Real (1) or Fake (0) Latent Representation

Class Label

Generated Beat





Real Beat

Class Label

Fine Tune Training

Is it Correct?

Figure 3.18: Basic Architecture of Conditional GAN.

beat. The latent input is randomly sampled from a gaussian noise distribution. The generator aims to approximate the underlying real heartbeat distribution given the corresponding class labelP(hb|c) and produces fake beats to deceive the discriminator model. The modified generator loss is depicted by Equation 3.15. It tries to minimize the log probability predicted by the discriminator model for synthetically generated beats, thereby encouraging the generation of those beats with less probability of being fake.

L(G) =minimize[log(1−D(G(z|c)))] (3.15) Figure 3.19a depicts the architecture of the generator. Generator model intakes a latent input (z) of length 100 sampled from a gaussian (normal) noise distribution (N(µ = 0, σ = 1)) [91, 224] and a class label (c), where c ∈ {0,1,2}. Here, (0,1,2) corresponds to Normal beat, SVEB, and VEB. The label is incorporated by employing an embedding layer followed by a fully connected layer with 100 neurons, similar to input shape, and later reshaped to match the dimensions of gaussian input noise.

The Gaussian noise is provided to a fully connected layer with 100 neurons followed by LeakyReLU activation function with a negative slope coefficient of 0.2 followed by a reshaping layer for concatenation with the encoded class information. The input noise and class label are then concatenated and provided to a 1-Dimensional (1D) Upsampling layer that increases the dimension of input by a factor of 2 followed by a 1D Conv layer with 32× 16 filters of size 15 and stride 1 with LeakyReLU TH-2764_156201001


activation function with a negative slope coefficient of 0.2. Six more Upsampling and Conv layers are added in a cascaded fashion for making the generator model as illustrated in Figure 3.19a. The combination of the 1D Conv layer and 1D Upsampling layer imitates a 1D Deconvolution layer. In penultimate layer, activation function is changed from LeakyReLU to hyperbolic tangent function, and generated heartbeat with 186 dimensions, in practice, should resemble a realistic-looking heartbeat of class (c).

Class Label (C) Embedding Layer Fully Connected (100)

1D Upsampling + 1D Convolution [32*16, 15, 1, LeakyReLU (0.2) ]

Gaussian Noise (100) Fully Connected (100) LeakyReLU Activation

Reshape Reshape

1D Upsampling + 1D Convolution [32*8, 4, 2, LeakyReLU (0.2) ] 1D Upsampling + 1D Convolution [32*8, 4, 2, LeakyReLU (0.2) ] 1D Upsampling + 1D Convolution [32*4, 4, 2, LeakyReLU (0.2) ] 1D Upsampling + 1D Convolution [32*4, 4, 2, LeakyReLU (0.2) ] 1D Upsampling + 1D Convolution [32*2, 4, 2, LeakyReLU (0.2) ] 1D Upsampling + 1D Convolution [32, 4, 2, LeakyReLU (0.2) ] 1D Upsampling + 1D Convolution [F=1, K=4, S=2, Act=TanH ]

Generated Heartbeat (186) with Class Label (C)

100 Concatenate 100

(a) Generator Architecture

Class Label (C) Embedding Layer Fully Connected (186) 1D Conv [48, 19, 4, LeakyReLU (0.2) ]

Reshape Generated Beat (186)

OR Real Beat (186) 1D Conv [64, 15, 3, LeakyReLU (0.2) ]

1D Conv [80, 11, 2, LeakyReLU (0.2) ] 1D Conv [96, 9, 2, LeakyReLU (0.2) ] 1D Conv [112, 7, 2, LeakyReLU (0.2) ]

1D Global Average Pooling

Fully Connected (100) (Linear Activation) Fully Connected (1) (Sigmoid Activation)

1D Conv [48, 9, 4, LeakyReLU (0.2) ] 1D Conv [64, 7, 3, LeakyReLU (0.2) ] 1D Conv [80, 5, 2, LeakyReLU (0.2) ] 1D Conv [96, 3, 2, LeakyReLU (0.2) ] 1D Conv [112, 3, 2, LeakyReLU (0.2) ]

1D Global Average Pooling Fully Connected (100) (Linear Activation)

186 186

112 112

Concatenate Concatenate

(b) Discriminator Architecture Figure 3.19: Illustration of Generator and Discriminator Architecture.

Discriminator Model: The discriminator model intakes real heartbeats (x), and synthetically generated heartbeats (G(z|c)) along with their corresponding class label (c) and classifies them as either real or fake. Here,xis represented as xn∈RT, G(z) is represented as G(zn) ∈ RT, T corresponds to the beat dimension. The modified discriminator loss is described using Equation 3.16. The loss function aims at maximizing the log probability of real beats and inverse log probability of synthetically generated beats.

L(D) = maximize[logD(x|c) + log(1−D(G(z|c)))] (3.16) 68



Figure 3.19b depicts the architecture of the discriminator. The discriminator model intakes the heartbeat (both real and fake) (x) of 186 length and a class label (c), where c∈ {0,1,2}. Here, (0,1,2) corresponds to Normal, SVEB, and VEB class of beats. The class label is incorporated in a similar fashion as described in the generator model. The input beats and reshaped embedded class label are further concatenated, and a parallel convolution neural network is employed. The main idea behind applying such convolution layers in parallel is that the beats of 186 dimension consist of both local and global patterns. The global patterns are extracted using the large kernels embedded in the left part of PCNN, and the local patterns are extracted using the small kernels embedded in the right branch of PCNN. The number of filters, stride, and activation function in each adjacent convolution layer are kept similar, and only the kernel size is varied. For instance, the first convolution layer in the left branch after input concatenation encompasses a 1-Dimensional convolution layer with 48 filters of size 19 and stride 4 with LeakyReLU activation function with a negative slope coefficient of 0.2. Four more convolution layers are added in a cascaded fashion in both branches, followed by 1-Dimensional global average pooling (GAP) layer [225].

GAP calculates the spatial average of filters, making it robust to spatial transla- tions present in input ECG beats. The advantages of GAP over the combination of flatten and fully connected layer are (i) less prone to overfitting; (ii) no dependency on external regularization as it behaves like a structural regularizer; (iii) no trainable parameters similar to max-pooling layer. The reduced parameters lead to significantly faster training and reduced model size, making it suitable for mobile devices. The GAP layer reduces the last layer dimensions from (2,112) to (112). These outputs are further concatenated and provided as input to two fully connected layers with 100 neurons, each with a linear activation function followed by a single neuron layer with the sigmoid activation function that generates a value (v), where v ∈(0,1). (v) is thresholded and is classified as fake if v ∈(0,0.5) and real if v ∈[0.5,1).

DCCGAN Training: The training alternates between generator and discrimi- nator. During the discriminator training, n samples are randomly sampled from real heartbeats andn synthetic heartbeats are generated using the generator network for the three corresponding classes. For the real heartbeats, the discriminator is pro- vided with labels in the range [0.8,1], and for synthetic heartbeats, the discriminator is provided with labels in the range [0,0.2]. It can be observed that the labels are modified from 0 and 1 to an interval using the concept of soft labels allowing a faster convergence in the discriminator loss during initial batches of training.

After training the discriminator, the generator model is trained where the dis- TH-2764_156201001


criminator model weights are frozen so that they do not get affected during generator training. During generator training, 2×n samples are randomly sampled from the Gaussian noise distribution and provided to the generator in addition to the class information allowing the generator to generate fake heartbeat samples. These fake and real heartbeats are provided as input to the discriminator with the corresponding class labels. Again soft labels are provided as input to the discriminator, i.e., labels in the range [0.8,1] for real beats and labels in the range [0,0.2] for synthetic beats. The discriminator performs a forward propagation and compares the predicted label with soft labels and backpropagates the error to the generator network. During the back- propagation, the discriminator weights are not changed, and the generator weights are modified, thereby training the generator. Therefore, the generator network is trained in an adversarial fashion using the discriminator.

This training sequence is followed for training both the networks alternatively for several batches, and their losses are recorded. The training is not stopped following any early stopping criteria as this remains an open problem in the research area of GANs and because the GAN training is unstable. However, the generator models were saved after every second batch. The training aims to achieve Nash Equilibrium by approximating generator probability distribution Pz(z) to the real heartbeat dis- tribution. Adam optimizer [226] is employed during the training with LR = 0.0002 as suggested in [223] along with batch-wise training. More filters with large sizes are preferred for generator and discriminator as larger filters cover more signal times- tamps and account for more information present in the beat and previous filter. It also allows for maintaining smoothness in the information present in the filters.

Moreover, DCCGAN took several hundred batches of training before generating any meaningful beats for the corresponding class. In initial batches, random noise signals were generated, and later, realistic-looking beats were generated. For this reason, early stopping criteria was not used as sometimes the generator model was unable to generate meaningful beats after several hundred batches of training. For some models, the discriminator loss approached 0, and the generated beats resembled random noise-looking signals. Therefore, the training was restarted after modification of certain parameters such as the kernel size or adding or removing certain layers.