next up previous
次へ: RECOGNITION ALGORITHM REDUCING THE 上へ: AN EFFICIENT ALGORITHM FOR 戻る: INTRODUCTION

CONTINUOUS SPEECH RECOGNITION ALGORITHM USING WORD TRIGRAM MODELS

In this section, we describe a continuous speech algorithm using word trigram models based on Viterbi search. Well-known algorithms for continuous speech recognition include two-level DP matching, level building or Viterbi search (one-pass DP). Among them, the Viterbi search (one-pass DP) algorithm is well suited for Markov models with a language model such as bigram or trigram. To compute the Viterbi path to the n'th recognized word $w_n$, at time t, we need to know the Viterbi paths emerging from each $
w_{n-1} $ word candidate, at time t-1. However, for the trigram algorithm, for each previous word candidate $
w_{n-1} $, we need to know not only the Viterbi paths emerging from words at time t-1, but also the most likely paths passing through all possible $ w_{n-2},w_{n-1} $ word pair combinations.

We define $w(t)$ the word uttered at time t, $ w_{\mbox{\tiny {Prev}}}(t) $ the word uttered previous $w(t)$, $G_t(w_1,w_0,i) = \mbox{Prob}(O_1,O_2,...,O_{t-1} \bigwedge
s(t) = i, w(t) = w_0, w_{\mbox{\tiny {Prev}}}(t) = w_1) $. We can calculate $G_t(w_1,w_0,i)$ recursively using the following algorithm.


表 1: A Viterbi Algorithm Using Word Trigram Models
[ Definition ]
$ l_w $:the number of states of word $w$
$ a^w_{ij}$:transition probability in word $w$ from state $s_i$ to state $s_j$
$ b^w_j(O(t))$:symbol output probability in word $w$ at state $s_j$ for observation vector at frame $t$
$P(w_0\vert w_2,w_1)$: trigram probability of word $ w_0$ after $w_2,w_1$ have appeared.
$Q$:vocabulary
$T$:the number of input frames
[ Initialization ]
execute step1 under $w_0=0,...,Q-1$
1) $G_0(start,w_0,0) = P(w_0\vert start,start) $
$start$ means sentence head
[ Viterbi search ]
execute step2 and step6 for $t=0,1,..,T-1$
2) execute step3 for $ w_1=0,...,Q-1 $
3) execute step4 for $w_0=0,...,Q-1$
4) execute step5 for $ i=0,1,...,l_{w_0}-2 $
5) $G_t(w_1,w_0,i)=$ $ \max( G_{t-1}(w_1,w_0,i) \times a^{w_0}_{i,i} \times b^{w_0}_i(O_t),$ $ G_{t-1}(w_1,w_0,i-1) \times a^{w_0}_{i-1,i} \times b^{w_0}_{i-1}(O_t)) $
[ Calculate word boundaries ]
6) execute step7 for $ w_1=0,1,...,Q-1 $
7) execute step8 for $ w_0=0,1,...,Q-1 $
8) $ \Delta = \mathop{\rm max}_{ 0\leq w_2 \leq Q-1 } ( G_{t-1}(w_2,w_1,l_{w_1}-2) $ $ \times a^{w_1}_{l_{w_1}-2,l_{w_1}-1} \times b^{w_1}_{l_{w_1}-1}(O_t) \times P(w_0\vert w_2,w_1) $
if $\Delta \geq G_t(w_1,w_0,0) $ then $G_t(w_1,w_0,0)=\Delta$


next up previous
次へ: RECOGNITION ALGORITHM REDUCING THE 上へ: AN EFFICIENT ALGORITHM FOR 戻る: INTRODUCTION
Jin'ichi Murakami 平成13年2月19日