**3.3 Conclusion**

**4.1.1 Training block**

**4.1.1.2 High-band feature vector extraction**

The high-band feature vectorY_{K}contains information of the proposed synthesis filter, which
is used in the bandwidth extension process. The bandwidth extension process is employed on
a stationary NB speech signal (NB frame) S_{N B}[n] for estimating the corresponding HB signal
S˜_{HB}[n^{0}], as shown in Figure 4.2, where n and n^{0} denote 8 kHz and 16 kHz sample index,
respectively. In Figure 4.2, A is a linear discrete time-invariant (LDTI) LP analysis filter, ↑ 2

A ↑2 K

SN B[n] N Bres S˜HB[n^{′}]

Figure 4.2: Bandwidth extension process applied to a stationary narrowband signal in order to estimate the corresponding high-band signal.

is an ideal upsampler with upsampling factor 2, and filter K is an LDTI synthesis filter. The TH-2564_156102023

4.1 A proposed set-up based on high-band modeling for artificial bandwidth extension of speech signals

transfer function of filter A is the inverse transfer function of an all-pole filter. The all-pole
filter of order 11 is found using the signalS_{N B}[n] with the help of linear prediction analysis (see
Figure 4.2) [5]. An NB residue signal (N B_{res}) is the response of filter A driven by the signal
S_{N B}[n] [4]. The WB residue signal is an upsampled NB residue signal by a factor of 2. It is
fed into the synthesis filter K in order to estimate the high-band signal ˜S_{HB}[n^{0}].

Here, our primary focus is to design the synthesis filterK in order to estimate the HB infor-
mation ˜S_{HB}[n^{0}] related to the NB signal S_{N B}[n]. It can be done by considering the NB signal
S_{N B}[n] generation process, bandwidth extension process, and HB signal generation process.

For this, an error system is made, as depicted in Figure4.3. In Figure4.3, HPF is a non-causal

A ↑2 K

SN B[n]

SW B[n^{′}] HPF S_{HB}[n^{′}]

S˜HB[n^{′}]
LPF

e

↓2 N B_{res} −

+

Figure 4.3: An error system set-up.

FIR high pass filter (HPF), which produces the true/original high-band signalS_{HB}[n^{0}]. In high-
band signal generation process, the signalS_{HB}[n^{0}] is generated by high pass filtering of the the
original wideband signalS_{W B}[n^{0}]. In this chapter, we focus on reconstruction ofS_{HB}[n^{0}], which
contain information about the high-band frequencies. This is justified as narrowband informa-
tion is available at the receiver end and we can utilize it as it is. LPF is a non-causal FIR
low pass filter. In the narrowband signal generation process, the narrowband signal S_{N B}[n] is
generated by passing the wideband signalSW B[n^{0}] through the LPF and subsequent downsam-
pling by a factor of 2 at the transmitter side. The synthesis filterK is designed for minimizing
the reconstruction error. We use a system norm to measure the reconstruction error [59]. In
Figure4.3,e=S_{HB}[n^{0}]−S˜_{HB}[n^{0}], i.e., the error between the true HB signal and estimated HB
signal. To minimize the error, it is beneficial to extract prior information associated with the
wideband speech signal S_{W B}[n^{0}]. A signal model F is used to represent the prior information
of the signalS_{W B}[n^{0}]. It is taken into account for Figure4.3, and the resulting set-up is shown
in Figure 4.4. Here, H_{0} and H_{1} denote the LPF and HPF, respectively. The signal S_{W B}[n^{0}] is
the output of the signal modelF driven by an input wwith known features (with finite energy,
TH-2564_156102023

4. Artificial bandwidth extension technique based on the high-band modeling

A ↑2 K

SN B[n]

SW B[n^{′}]

H1 SHB[n^{′}]

S˜HB[n^{′}]
H0

e

↓2 N Bres −

+ w F

Figure 4.4: A proposed architecture of the error system set-up for estimating the high-band signal.

specifically w ∈ `^{2}(Z,R^{n})). F(z) representing the rational transfer function of F, is assumed
to be a stable and causal transfer function. This model can be obtained by Prony’s method,
available in MATLAB [62,63]. The obtained model is causal but may be unstable. Hence,
it is converted into a stable model by inverting its unstable poles. This does not affect the
magnitude spectrum of F(z) but changes the phase [40]. The human auditory system is less
sensitive to phase of the speech signal [40]. Further, the number of poles and zeros in the signal
model was empirically calculated for each frame in such a way that the minimizes the error.

In Figure 4.4,H_{1}F and H_{0}F denote the signal models G_{1} and G_{2} defined in (1.3), respectively.

Signal models G_{1} and G_{2} have the spectral envelope information of the high-band signal (16
kHz) and narrowband signal (16 kHz), respectively. When compared to (1.3), we can easily see
that G1 = H1F and G2 = H0F in Figure 4.4. The signal of interest is the high-band signal
SW B[n^{0}]. Hence, high-band modeling is performed.

Problem formulation

We solve the following optimization problem for designing an optimal K(z).

Problem 3. Given a stable and causal transfer function F(z), two non-causal FIR filters H_{0}(z)
and H_{1}(z), design an optimal stable and causal synthesis filter K_{opt} defined as

K_{opt} := arg min

K

(kPk^{∞}), (4.1)

where P:=H_{1}F−K(↑2)A(↓2)H_{0}F. P maps w to e in Figure 4.4. Here, kPk^{∞} represents the
H^{∞}-norm of the system P.

Solution of Problem 3

Problem3is solved to obtain an optimal filterKopt. Filters H0 and H1 present in systemP are non-causal systems. Thereby, system P is a non-causal system. Hence, this system needs to be converted into a causal system for obtaining the solution using the solution given in

TH-2564_156102023

4.1 A proposed set-up based on high-band modeling for artificial bandwidth extension of speech signals

Appendix B. FIR filters H_{0} and H_{1} have a relation to each other, which can be written in
z-domain [48]

H_{1}(z) =H_{0}(−z). (4.2)

Consider the FIR filter H_{0}(z)

H0(z) =aQz^{−Q}+..+a1z^{−1} +a0 +a1z^{1} +...+aQz^{Q},

=z^{Q}H_{a}(z), (4.3)

withHa(z) := (aQz^{−2Q}+..+a1z^{−(Q+1)}+a0z^{−Q}+a1z^{−(Q−1)}+...+aQ) with ai ∈Rand Q can
be assumed as an even integer number without the loss of generality. The filter H1(z) can be
obtained by substituting (4.3) into (4.2), i.e.,

H_{1}(z) = z^{Q}H_{a}(−z) (4.4)

Next, we replace H_{0}(z) by (4.3) andH_{1}(z) by (4.4) in the system P as
P(z) =z^{Q}H_{a}(−z)F(z)−K(z)(↑2)A(z)(↓2)z^{Q}H_{a}(z)F(z),

=z^{Q}(F_{b}(z)−K(z)(↑2)A(z)(↓2)F_{a}(z)), (4.5)
where F_{b}(z) := H_{a}(−z)F(z) and F_{a}(z) := H_{a}(z)F(z). Further, the system P is transformed
into a causal system by delaying its response toQ samples as

P =z^{−Q}P, (4.6)

where the system P is a causal system. The H^{∞}-norm of the system P is equivalent to the
original systemPdue to the fact that the delaying process does not change theH^{∞}-norm of the
system [1]. Further, the solution of (4.6) is obtained using the solution given in Appendix B

TH-2564_156102023

4. Artificial bandwidth extension technique based on the high-band modeling

wherein the system P is converted into the generalized error system (see FigureB.1) as follows
G_{1}(z) = F_{b}(z),

G_{2}(z) =F_{a}(z),
G_{3}(z) =A(z),

K_{d}(z) = K(z). (4.7)

The obtained optimal filterK (K_{opt}) contains the high-band information of the wideband signal.

Filter K has infinite impulse response (IIR). So, it is converted into an approximate FIR filter
by truncating the Taylor series of K at the origin, which is taken as the high-band feature
vectorY_{K}. The number of coefficients in the FIR filter is taken 20 experimentally, as explained
in Section 4.2.