• No results found

Automatic Speech Recognition

N/A
N/A
Protected

Academic year: 2022

Share "Automatic Speech Recognition"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)
(4)

speaker, room acoustics, noise, microphone

Automatic Speech Recognition

word string

language model

can include semantics hypotheses

time align, pattern match utterance

local match

probability estimation

front end

signal processing and feature extraction noise

global decoder

(5)
(6)

xn xn xn

3 2

p(q | q )

2 1

p(q | q )

q2

q1 q3

p(q | q )

1 1 p(q | q ) p(q | q )

2 2 3 3

1

p(x | q )

2 p(x | q )

p(x | q )n n n 3

(7)
(8)
(9)

P(phone | acoustic vectors)

Acoustic Vectors

(10)
(11)
(12)

g = p g

(1-g)

<F > = p(1-g)

<F > = (1-p)g

2

1

(13)
(14)
(15)

q q q

3 2 1

q q

6 5 4

q

q q q

3

2 1

...

time HMM states

n

2 3 4 5 6 k K

"d"

"a"

"d"

1

. . . . .

k

.

x + acoustic context ANN

.

. .

.

.

.

.

. .

.

n

...

p(q | x )

n

.

(16)
(17)

P(phone | acoustic vectors)

Acoustic Vectors 500-4000 hidden units

vectors 9 26-dimensional

61 phones

(18)
(19)
(20)

dh ax kcl k ae tcl t

(21)
(22)

train MLP for task

recognize developmental set

train MLP with TIMIT

MLP weights recognize

developmental set

baseline score

viterbi alignment

labeled training

MLP weights

score

improved?

Yes

No

Done

(23)

Hidden

Data

left context

right context

binary binary

c.d. output left c.d. output right

c.i. output

(24)

Hidden

Data Output Probabilities

M/F

(25)
(26)
(27)
(28)

train MLP for task

recognize developmental set

viterbi alignment

Done

No Yes

MLP weights

score

improved?

train MLP with TIMIT

MLP weights recognize

developmental set

baseline score

labeled training generate multiple

pronunciation lexicon

(29)
(30)

I input units H hidden units O output units

I Input H/n Hidden

O Output

I Input H/n Hidden

O Output

Monolithic Net vs. Parallel Net Architecture

Net 1 Net 2

. . . .

Function Averaging

I Input H/n Hidden

O Output

Net n

(31)
(32)
(33)
(34)
(35)
(36)
(37)
(38)
(39)
(40)

References

Related documents

Please refer to Rabiner (1989) for a com- prehensive tutorial of HMMs and their applicability to ASR in the 1980’s (with ideas that are largely applicable to systems today). HMMs

The parts that are different than the simple GUS system are the dialog state tracker which maintains the current state of the dialog (which include the user’s most recent dialog

An encoder-decoder model includes an encoder, which reads in the input (grapheme) sequence, and a decoder, which generates the output (phoneme) sequence.. A typ- ical

We evaluate our DBRNN trained using CTC by decoding with several character-level language models: 5-gram, 7- gram, densely connected neural networks with 1 and 3 hidden layers

We evaluate our DBRNN trained using CTC by decoding with several character-level language models: 5-gram, 7- gram, densely connected neural networks with 1 and 3 hidden layers

The problem of automatic recognition and interpretation of the sodar facsimile pattern is considered i!Lffiis:-paper. In the proposed method, the basic pattern shapes such as

In order to develop water klcmn values, hard wood pulp is blended with softwood pulp; the former gives formation and smoot hness and later deve lop strength.. T

Table 5.2, 5.3, 5.4 presents the recognition rate at Rank-5, and Figure 7 presents a graphical representation of results in the form of Cumulative Match Rate (CMC). On the other hand,