3.3 Conclusion
4.1.1 Training block
4.1.1.2 High-band feature vector extraction
The high-band feature vectorYKcontains information of the proposed synthesis filter, which is used in the bandwidth extension process. The bandwidth extension process is employed on a stationary NB speech signal (NB frame) SN B[n] for estimating the corresponding HB signal S˜HB[n0], as shown in Figure 4.2, where n and n0 denote 8 kHz and 16 kHz sample index, respectively. In Figure 4.2, A is a linear discrete time-invariant (LDTI) LP analysis filter, ↑ 2
A ↑2 K
SN B[n] N Bres S˜HB[n′]
Figure 4.2: Bandwidth extension process applied to a stationary narrowband signal in order to estimate the corresponding high-band signal.
is an ideal upsampler with upsampling factor 2, and filter K is an LDTI synthesis filter. The TH-2564_156102023
4.1 A proposed set-up based on high-band modeling for artificial bandwidth extension of speech signals
transfer function of filter A is the inverse transfer function of an all-pole filter. The all-pole filter of order 11 is found using the signalSN B[n] with the help of linear prediction analysis (see Figure 4.2) [5]. An NB residue signal (N Bres) is the response of filter A driven by the signal SN B[n] [4]. The WB residue signal is an upsampled NB residue signal by a factor of 2. It is fed into the synthesis filter K in order to estimate the high-band signal ˜SHB[n0].
Here, our primary focus is to design the synthesis filterK in order to estimate the HB infor- mation ˜SHB[n0] related to the NB signal SN B[n]. It can be done by considering the NB signal SN B[n] generation process, bandwidth extension process, and HB signal generation process.
For this, an error system is made, as depicted in Figure4.3. In Figure4.3, HPF is a non-causal
A ↑2 K
SN B[n]
SW B[n′] HPF SHB[n′]
S˜HB[n′] LPF
e
↓2 N Bres −
+
Figure 4.3: An error system set-up.
FIR high pass filter (HPF), which produces the true/original high-band signalSHB[n0]. In high- band signal generation process, the signalSHB[n0] is generated by high pass filtering of the the original wideband signalSW B[n0]. In this chapter, we focus on reconstruction ofSHB[n0], which contain information about the high-band frequencies. This is justified as narrowband informa- tion is available at the receiver end and we can utilize it as it is. LPF is a non-causal FIR low pass filter. In the narrowband signal generation process, the narrowband signal SN B[n] is generated by passing the wideband signalSW B[n0] through the LPF and subsequent downsam- pling by a factor of 2 at the transmitter side. The synthesis filterK is designed for minimizing the reconstruction error. We use a system norm to measure the reconstruction error [59]. In Figure4.3,e=SHB[n0]−S˜HB[n0], i.e., the error between the true HB signal and estimated HB signal. To minimize the error, it is beneficial to extract prior information associated with the wideband speech signal SW B[n0]. A signal model F is used to represent the prior information of the signalSW B[n0]. It is taken into account for Figure4.3, and the resulting set-up is shown in Figure 4.4. Here, H0 and H1 denote the LPF and HPF, respectively. The signal SW B[n0] is the output of the signal modelF driven by an input wwith known features (with finite energy, TH-2564_156102023
4. Artificial bandwidth extension technique based on the high-band modeling
A ↑2 K
SN B[n]
SW B[n′]
H1 SHB[n′]
S˜HB[n′] H0
e
↓2 N Bres −
+ w F
Figure 4.4: A proposed architecture of the error system set-up for estimating the high-band signal.
specifically w ∈ `2(Z,Rn)). F(z) representing the rational transfer function of F, is assumed to be a stable and causal transfer function. This model can be obtained by Prony’s method, available in MATLAB [62,63]. The obtained model is causal but may be unstable. Hence, it is converted into a stable model by inverting its unstable poles. This does not affect the magnitude spectrum of F(z) but changes the phase [40]. The human auditory system is less sensitive to phase of the speech signal [40]. Further, the number of poles and zeros in the signal model was empirically calculated for each frame in such a way that the minimizes the error.
In Figure 4.4,H1F and H0F denote the signal models G1 and G2 defined in (1.3), respectively.
Signal models G1 and G2 have the spectral envelope information of the high-band signal (16 kHz) and narrowband signal (16 kHz), respectively. When compared to (1.3), we can easily see that G1 = H1F and G2 = H0F in Figure 4.4. The signal of interest is the high-band signal SW B[n0]. Hence, high-band modeling is performed.
Problem formulation
We solve the following optimization problem for designing an optimal K(z).
Problem 3. Given a stable and causal transfer function F(z), two non-causal FIR filters H0(z) and H1(z), design an optimal stable and causal synthesis filter Kopt defined as
Kopt := arg min
K
(kPk∞), (4.1)
where P:=H1F−K(↑2)A(↓2)H0F. P maps w to e in Figure 4.4. Here, kPk∞ represents the H∞-norm of the system P.
Solution of Problem 3
Problem3is solved to obtain an optimal filterKopt. Filters H0 and H1 present in systemP are non-causal systems. Thereby, system P is a non-causal system. Hence, this system needs to be converted into a causal system for obtaining the solution using the solution given in
TH-2564_156102023
4.1 A proposed set-up based on high-band modeling for artificial bandwidth extension of speech signals
Appendix B. FIR filters H0 and H1 have a relation to each other, which can be written in z-domain [48]
H1(z) =H0(−z). (4.2)
Consider the FIR filter H0(z)
H0(z) =aQz−Q+..+a1z−1 +a0 +a1z1 +...+aQzQ,
=zQHa(z), (4.3)
withHa(z) := (aQz−2Q+..+a1z−(Q+1)+a0z−Q+a1z−(Q−1)+...+aQ) with ai ∈Rand Q can be assumed as an even integer number without the loss of generality. The filter H1(z) can be obtained by substituting (4.3) into (4.2), i.e.,
H1(z) = zQHa(−z) (4.4)
Next, we replace H0(z) by (4.3) andH1(z) by (4.4) in the system P as P(z) =zQHa(−z)F(z)−K(z)(↑2)A(z)(↓2)zQHa(z)F(z),
=zQ(Fb(z)−K(z)(↑2)A(z)(↓2)Fa(z)), (4.5) where Fb(z) := Ha(−z)F(z) and Fa(z) := Ha(z)F(z). Further, the system P is transformed into a causal system by delaying its response toQ samples as
P =z−QP, (4.6)
where the system P is a causal system. The H∞-norm of the system P is equivalent to the original systemPdue to the fact that the delaying process does not change theH∞-norm of the system [1]. Further, the solution of (4.6) is obtained using the solution given in Appendix B
TH-2564_156102023
4. Artificial bandwidth extension technique based on the high-band modeling
wherein the system P is converted into the generalized error system (see FigureB.1) as follows G1(z) = Fb(z),
G2(z) =Fa(z), G3(z) =A(z),
Kd(z) = K(z). (4.7)
The obtained optimal filterK (Kopt) contains the high-band information of the wideband signal.
Filter K has infinite impulse response (IIR). So, it is converted into an approximate FIR filter by truncating the Taylor series of K at the origin, which is taken as the high-band feature vectorYK. The number of coefficients in the FIR filter is taken 20 experimentally, as explained in Section 4.2.