Recognition Algorithm Using Word Trigram Models

In this section, we describe a frame synchronous speech recognition algorithm using word trigram models based on Viterbi search. Among the many speech recognition algorithms, the Viterbi search (or one-pass DP) is well suited for Markov models used as language models like bigrams or trigrams.

To compute the Viterbi path to the n'th recognized word

, at time

for the bigram algorithm, we need to know the Viterbi paths emerging from each $w_{n-1}$ word candidate, at time

. However, for the trigram algorithm, for each previous word candidate $w_{n-1}$ , we need to know not only the Viterbi paths emerging from words at time

, but also the most likely paths passing through all possible $w_{n-2},w_{n-1}$ word pair combinations. This means that this algorithm essentially requires a lot of memory and high computational costs.

We define

as the word uttered at time

, $w_{\mbox{\tiny {Prev}}}(t)$ as the word uttered prior to

, and
$G_t(w_1,w_0,i) = \mbox{Prob}(O_0,O_1,...,O_{t-1} \bigwedge s(t) = i, w(t) = w_0, w_{\mbox{\tiny {Prev}}}(t) = w_1)$ . We can calculate

recursively using the following algorithm (Table 1 ).

**表 1:** Recognition algorithm using word trigram models (Viterbi search)
[ Definition ]
:number of states of word
$a^w_{ij}$ :transition probability in word
from state to state
:symbol output probability in word
at state for observation vector at frame
$P(w_c\vert w_a,w_b)$ :trigram probability of word
after have appeared.
:vocabulary size
:number of input frames
$\alpha$ :weight between trigram probability and HMM likelihood
[ Initialization ]
execute step1 for
1) $G_0(start,w_0,0) = P(w_0\vert start,start)$
( means sentence head )
[ Viterbi search ]
execute step2 and step6 for
2) execute step3 for
3) execute step4 for
4) execute step5 for $i=0,1,...,l_{w_0}-2$
5) if
$G_{t-1}(w_1,w_0,i) \times a^{w_0}_{i,i} \times b^{w_0}_i(O_t)$
else

$\max( G_{t-1}(w_1,w_0,i) \times$ $a^{w_0}_{i,i} \times b^{w_0}_i(O_t),$
$G_{t-1}(w_1,w_0,i-1) \times a^{w_0}_{i-1,i} \times b^{w_0}_{i-1}(O_t))$
[ Viterbi search ( word boundaries ) ]
6) execute step7 for
7) execute step8 for
8) $\Delta = \mathop{\rm max}_{ 0\leq w_2 \leq Q-1 } ( G_{t-1}(w_2,w_1,l_{w_1}-2)$
$\times a^{w_1}_{l_{w_1}-2,l_{w_1}-1}$ $\times b^{w_2}_{l_{w_1}-2}(O_t) \times P(w_0\vert w_2,w_1) ^ \alpha)$
if $\Delta \geq G_t(w_1,w_0,0)$ then $G_t(w_1,w_0,0)=\Delta$