values of detail coefficients. To check how well the wavelet discriminates the FHS, the ratio of the sum of the absolute values of selected coefficients corresponding to S1 and S2 sounds against other intervals and across all recordings is measured. The wavelet yielding the highest ratio value is considered the best choice.

the expectation of the given observation sequenceO from a given modelΛ.ˆ Qˆ = arg max

Q P(Q|O,Λ)ˆ (2.45)

The traditional method of solving equation (2.45) is to calculate likelihood for all possible combinations of state sequencesQand select the sequence which holds maximum probability.

This approach is computationally costly. Therefore, a dynamic programming method called
the Viterbi algorithm is used to find the optimal state sequence. It involves estimation of the
instantaneous probabilityδ_{t}(i)that yields the best score along the path defined by the firstt
observations and ends at stateS_{i} in the forward direction:

δ_{t}(i) = P(O_{1}, O_{2}, O_{3}, ..., O_{t}, q_{t} =S_{i}|λ) (2.46)
by induction:

δ_{t}(j) = max

1≤i≤N[δ_{t−1}(i)a_{ij}]b_{j}(O_{t}) (2.47)

It is initialized using the initial state distribution:

δ_{1}(i) = π_{i}b_{i}(O_{t}) (2.48)
To keep track of the instantaneous state which has the maximum probability to transit from
previous state to present state, another parameterΨt is used:

Ψ_{t}(j) =

0, t = 1

arg max

1≤i≤N[δt−1(i)a_{ij}], ∀ t >1

(2.49)
Then the optimal state sequence is back-track starting from the end stateqˆ_{T} holding maximum
probabilityδ_{T}:

ˆ

q_{T} = arg max

1≤i≤N[δ_{T}(i)] (2.50)

q_{t}^{∗} = Ψ_{t+1}(q_{t+1}^{∗} ), t=T −1, T −2, ...,1 (2.51)
It is worth noting that the HMM model does not explicitly incorporate any duration infor-
mation. The self transition probabilitya_{ii}, which represents statistical duration information, is

used to determine the duration of being in the same state. For any non-zeroa_{ii}, the result is
a geometrical distribution, which is equivalent to the duration being a one-time step. Also,
the transition probability remains the same no matter at what instance it is calculated. This
increases the chances of misclassification of an instantaneous state in the sequence where
it is expected to have the same state. Therefore, HMM poorly suits for the segmentation of
signals where duration information is important.

In order to incorporate duration information, duration probability ‘p’ is introduced as an additional parameter in the standard HMM. The duration dependent hidden semi-Markov model (HSMM) is defined as:

Λ =ˆ {A, B, π, p} (2.52)

wherep={p_{i}(d)}is the probability density function of being in stateialong durationd. Based
on the inclusion of new parameter, the Viterbi algorithm is modified as under:

δ_{t}(j) = max

d max

i6=j [δt−d(i)a_{ij}]p_{j}(d)

d−1

Y

s=0

b_{j}(Ot−s) (2.53)

For HSS, dis considered as the entire duration of a heart cycle [5]. Both duration of being in current state and previous state that maximize (2.53) are vital information and they are tracked at each instant of time t:

∆_{t}(j) = arg max

d max

i6=j [δt−d(i)a_{ij}]p_{j}(d)

d−1

Y

s=0

b_{j}(Ot−s) (2.54)

Ψ_{t}(j) = arg max

i6=j

δt−∆t(j)(i)a_{ij}

, 1≤i≤N (2.55)

The most likely state sequence is obtained by using the extended back-tracking algorithm [5]. This algorithm extrapolates the state likelihood δt−d beyond the beginning or end of a PCG sequence enabling to model the state sequence at the start or end of the PCG signal.

This is achieved by equating the parameters beyond the beginning or end of the sequence to

the respective parameter value at the start or the end point of the sequence. These changes
are introduced to (2.53) and (2.54) by equating δt−d = δ_{1}, ∀ t < d andδt−d = δ_{T}, ∀ t > T.
Similar changes are made in the observation sequence too. The extended algorithm is given
as:

T^{∗} = arg max

t [δ_{T}_{:T}_{+d}_{max}−1(j)] 1≤j ≤N

q_{T}^{∗}∗ = arg max

1≤j≤N[δ_{T}^{∗}(j)]

t =T^{∗}

Perform the following back-tracking algorithm to get the desired optimal state sequence:

d^{∗} = ∆_{t}(q_{t}^{∗}),
q^{∗}_{t−d}∗:t−1 =q^{∗}_{t},
q^{∗}_{t−d}∗−1 = Ψt(q^{∗}_{t}),
t=t−d^{∗},

∀t >1 (2.56)

**2.4.1** **HSMM for segmentation of PCG**

In [5,16], the HSMM model for HSS is built over two assumptions. First, the state sequence will always occur in a fixed order, i.e. S1→systole(Sys)→S2→diastole(Dia)

| {z }

one heart cycle

→next heart cycle.

Secondly, the duration of each component (state) is consistent for an individual subject.

Discussions of how the duration parameters are estimated is presented in chapter 5. These assumptions simplify the initialization of model parameters and improves the segmentation accuracy. The transition matrix{aij}and the initial state probability {πj}need not be trained.

It also avoid misclassification of an observation in the expected location of any standard heart cycle.

Both these assumptions are ideal for almost all the heart sound signals, both normal and pathological conditions. Certain abnormal conditions, such as skipped heartbeat (also known as heart palpitations) or heart rhythm problem (arrhythmias), may fail the model due

to the presence of irregularities in the heart rate. The algorithm will wrongly predict the expected state in a pre-programmed location even though it does not occur in the signal.

Such misclassification may easily occur in a PCG signal exhibiting large heart rate variation (HRV). A healthy subject may also experience HRV during inspiration or expiration (also termed as respiratory sinus arrhythmia), deep breathing during medical examination, or because of physical or mental stress. Sometimes the difference between the minimum and the maximum heart cycle duration (HCD) values is much larger than the average intervals.

In such condition, taking a single Gaussian probability density function (PDF) as a state
duration model may not fit the desired spectrum. The model is constrained by the meanµ
value indicating the center of the PDF and the varianceσ^{2} describing the dispersion around
it. If a single-mode distribution has to cover the full extent of such data distribution, the soft
gradient of the model will not match the distribution of the actual data points. The estimated
likelihood using the model will not be accurate.

A multi-modal duration distribution based HSMM can be introduced to address this problem. The model will be a cascade of weighted Gaussian distribution of all possible state duration of a subject. That way each mode acts as a local search space with a sharper gradient converging the likelihood towards the expected duration. The final state duration is estimated as the instance of the distribution which bears the maximum likelihood value.