• No results found

R-peak end-T-wave locations in corresponding


Training set

R-peak end-T-wave locations in corresponding


Train HSMM

and duration models

Viterbi algorithm (Heart sound segmentation)

Performance evaluation Test set

Feature extraction S1&S2labels Feature extraction


(separately on training test sets) Extraction of duration


Figure 5.3:Block diagram for HSMM based heart sound segmentation algorithm.

5.3.2 Feature extraction

An envelope of PCG signal can impart vital information in the form of envelope peaks rhythmically separated by the depressed segment of the envelope that can easily be correlated with the probable locations of FHS and the silent intervals of systole and diastole based on the duration differences. In clinical practice, such information is intuitively used by a physician or an expert to analyze the heart sound. Frequency analysis also reveals that the fundamental heart sounds have the majority of the signal energy below 150 Hz with a peak at 50 Hz [19].

Therefore, the features used in this work are defined to closely meet the above description.

As discussed in [5, 6, 112], four envelope features are extracted: the homomorphic envelope, Hilbert envelope, wavelet envelope, and power spectral density (PSD) envelope.

The homomorphic envelope and Hilbert envelope are signal intensity-based features that are easily affected by the presence of noise and murmurs. Before extracting these features, the PCG signals are preprocessed. In this work, the proposed dual filtering technique is incorporated as a denoising tool before envelope extraction. The combination of BPF and TVF has a better noise reduction capability compared to the conventional low-pass filter [56].

The details of the proposed preprocessing method is discussed in Chapter 3.

From the analysis of PSD of heart sound, it is found that the spectral peak occurs approximately at 50 Hz [5]. This is true for a healthy subject. In pathological condition such as myocardial infarction, cardiomyopathy, and valvular defects, the spectral peak of the S1 and S2 sounds may deviate depending on the elastic property or thickness of myocardium, and the pressure built up in the heart chambers [28]. The exact peak frequencies associated with different pathological cases are not fully studied. Therefore, the peak frequency of the FHS is found from within the bandwidth of 20 Hz and 150 Hz. The final feature is calculated as the PSD of the peak frequency. The Hamming window of width 50 ms with50% overlap is used for short-term-Fourier-transform of the PCG. This window size is the shortest expected duration of the FHS that can cover its spectral information [5].

All these features are normalized to zero-mean unit variance and later downsampled to

50 Hz in order to minimize the computation cost.

5.3.3 Estimation of parameters

The standard method to train the HSMM parameters (πi, aij, bj(Ot)) is using the Baum–Welch algorithm [113, 114]. It iteratively estimates the model parameters by maximum likelihood criterion. It makes use of the forward-backward algorithm to compute the statistics for the expectation step without using the information of the true states of observations. But this method may not be feasible for the analysis of PCG signals, which have identical instantaneous observations at different states. That means a likelihood of similar observation sequence may be maximized at multiple states. Therefore without using prior knowledge of true state, the EM algorithm may result in incorrect estimation. In addition to this, the computation involves is time-consuming. Therefore, the parameter values of the model are defined as a set of global parameters applicable to all recordings, and they are determined using statistics from a training dataset where the true states are known.

Considering the fact that the initial state of a PCG signal can be any of the four states, the initial state probability πi is equated to 14. In defining transition matrix, introducing duration model into HSMM rule out the need of self-transition probabil- ity. By assuming that the heart sound components always occur in a fixed order, i.e., S1→systole(Sys)→S2→diastole(Dia)

| {z }

one heart cycle

→next heart cycle, there is only one possible transi- tion from each state and its value is rendered unity.


S1 Sys S2 Dia i/j

0 1 0 0

0 0 1 0

0 0 0 1

1 0 0 0

S1 Sys S2 Dia Estimation of observation probability

During the training phase, locations of R-peak and end-T-wave in the corresponding ECG are used as the labels of S1 and S2 sounds in the PCG signal. The duration of S1 sound is considered to begin at the location of R-peak and for a period of mean-S1-duration (µdS1).

Similarly, the S2 sound is identified around the location of the end-T-wave that has the maximum S2 sound amplitude. By taking this position as the center and the window length of mean-S1-duration (µdS2), the S2 sound interval in PCG is approximated. The values for µdS1 andµdS2 are discussed in Section 5.1. The features of each state/component from all the training data are cascaded into one class, forming a feature matrix. Finally, the emission probability ’bj(Ot)’ is derived from the feature matrix using logistic regression (LR) classifier based on a one-verses-all approach [5].

5.3.4 Testing

For testing, the state duration models were estimated from the individual signal itself. The formulation of existing and proposed duration models are explained in Section 5.1 and 5.2.

The performance of these two models are compared by conducting the following experiments.

• experimentation using only homomorphic envelope [16].

• experimentation using homomorphic envelope after proposed dual-filtering (HEoDF).

• experimentation using the homomorphic, Hilbert, PSD, and wavelet envelope features [5].

• experimentation using the HEoDF, Hilbert envelope after dual-filtering, modified PSD, and wavelet envelope features.

The performance is measured based on how accurately the S1 and S2 sounds are detected. The reference locations used for validation of the S1 and S2 sounds are the locations of R-peak and the end of T-wave in the corresponding ECG respectively. These references are the approximate positions to indicate the beginning of the heart sounds. There

is some tolerance in this assumption and it is taken to be within 50 ms as suggested in [5].

If the start of the detected S1 and S2 sounds are located within 50 ms of their respective references then it is considered as truly positive (TP). Otherwise, they are graded as falsely positive (FP). If the heart sounds are not detected at the intended reference position then it is considered as falsely negative (FN). Then, the sensitivity (Se), positive predictivity (P+) and F1 score were measured. The observations are repeated multiple times over 30 iterations for both the algorithm taking random training set and test set.