In this section, we describe a frame synchronous speech recognition algorithm using word trigram models based on Viterbi search. Among the many speech recognition algorithms, the Viterbi search (or one-pass DP) is well suited for Markov models used as language models like bigrams or trigrams.
To compute the Viterbi path to the n'th recognized word , at time for the bigram algorithm, we need to know the Viterbi paths emerging from each word candidate, at time . However, for the trigram algorithm, for each previous word candidate , we need to know not only the Viterbi paths emerging from words at time , but also the most likely paths passing through all possible word pair combinations. This means that this algorithm essentially requires a lot of memory and high computational costs.
We define as the word uttered at time ,
as the word uttered prior to
, and
. We can calculate
recursively using
the following algorithm (Table 1 ).
[ Definition ] |
:number of states of word |
:transition probability in word |
from state to state |
:symbol output probability in word |
at state for observation vector at frame |
:trigram probability of word |
after have appeared. |
:vocabulary size |
:number of input frames |
:weight between trigram probability and HMM likelihood |
[ Initialization ] |
execute step1 for |
1) |
( means sentence head ) |
[ Viterbi search ] |
execute step2 and step6 for |
2) execute step3 for |
3) execute step4 for |
4) execute step5 for |
5) if |
else |
[ Viterbi search ( word boundaries ) ] |
6) execute step7 for |
7) execute step8 for |
8) |
if then |