In this section, we describe a frame synchronous speech recognition algorithm using word trigram models based on Viterbi search. Among the many speech recognition algorithms, the Viterbi search (or one-pass DP) is well suited for Markov models used as language models like bigrams or trigrams.
To compute the Viterbi path to the n'th recognized word
, at time
for the bigram algorithm, we need to
know the Viterbi paths emerging from each
word candidate, at time
. However, for the trigram
algorithm, for each previous word candidate
,
we need to know not only the Viterbi paths emerging from
words at time
, but also the most likely paths
passing through all possible
word
pair combinations. This means that this algorithm
essentially requires a lot of memory and high
computational costs.
We define
as the word uttered at time
,
as the word uttered prior to
, and
. We can calculate
recursively using
the following algorithm (Table 1 ).
| [ Definition ] |
| from state |
| at state |
|
|
| after |
| [ Initialization ] |
| execute step1 for |
| 1)
|
| ( |
| [ Viterbi search ] |
| execute step2 and step6 for |
| 2) execute step3 for |
| 3) execute step4 for |
| 4) execute step5 for
|
| 5) if |
|
|
| else |
|
|
|
|
|
|
| [ Viterbi search ( word boundaries ) ] |
| 6) execute step7 for
|
| 7) execute step8 for
|
| 8)
|
|
|
| if
|