In this section, we describe a frame synchronous speech recognition algorithm using word trigram models based on Viterbi search. Among the many speech recognition algorithms, the Viterbi search (or one-pass DP) is well suited for Markov models used as language models like bigrams or trigrams.
To compute the Viterbi path to the n'th recognized word
, at time
for the bigram algorithm, we need to
know the Viterbi paths emerging from each
word candidate, at time
. However, for the trigram
algorithm, for each previous word candidate
,
we need to know not only the Viterbi paths emerging from
words at time
, but also the most likely paths
passing through all possible
word
pair combinations. This means that this algorithm
essentially requires a lot of memory and high
computational costs.
We define as the word uttered at time
,
as the word uttered prior to
, and
. We can calculate
recursively using
the following algorithm (Table 1 ).
[ Definition ] |
![]() ![]() |
![]() ![]() |
from state ![]() ![]() |
![]() ![]() |
at state ![]() ![]() |
![]() ![]() |
after ![]() |
![]() |
![]() |
![]() |
[ Initialization ] |
execute step1 for ![]() |
1)
![]() |
( ![]() |
[ Viterbi search ] |
execute step2 and step6 for ![]() |
2) execute step3 for ![]() |
3) execute step4 for ![]() |
4) execute step5 for
![]() |
5) if ![]() |
![]() ![]() |
else |
![]() |
![]() ![]() |
![]() |
[ Viterbi search ( word boundaries ) ] |
6) execute step7 for
![]() |
7) execute step8 for
![]() |
8)
![]() |
![]() ![]() |
if
![]() ![]() |