Many continuous speech recognition systems have recently been reported. These systems are given language information to improve the recognition rate because the syllable recognition rates are too low. Many different language models exist. These include a word trigram model for isolated word recognition, e.g., in an IBM system (Averbuch, 1986). Bigram models are used in CMU's SPHINX continuous speech recognition(Lee, 1988). ATR's continuous speech recognition system(Kita, 1989) uses a context-free grammar. Other language models such as network grammars or unification grammars have also been used. Among these language models, the word bigram model(Lee, 1988) seems to be the most widely used.
Word bigram models are effective models in spite of their simplicity. These models, however sometimes generate grammatical incorrect sentences. The reason for this is that word bigram models are too simple to express natural language. This can be improved by using word trigram models. There are two problems with using word trigram models in continuous speech recognition systems. One is the reliability of trigram probabilities. Since there are many parameters in word trigram models, e.g., for a 1,000 word vocabulary has 1,000,000,000 parameters, a very large amount of text data. The other problem is the large amount of memory and computational cost required for speech recognition. For these reasons, very few studies on continuous speech recognition systems using only word trigram models have been reported.
One possible method avoiding these problems is to use categories. If we use categories instead of words, the number of parameters is reduced. The memory and computational cost are reduced in speech recognition. Therefore, this method solves both problems together. Shikano(Shikano, 1987) and Murase(Murase, 1990) reported this model and showed experimental results. However, a word trigram grammar has a lower perplexity than a category trigram model. So the need for developing efficient implementations of the word trigram model remains.
In this paper, we first introduce a continuous speech recognition algorithm using word trigram models based on Viterbi search. Next, we show that the algorithm reduces memory requirements and computational cost. We explain how to use the beam search and how to calculate the Viterbi path. Finally, we present the experimental results obtained using this algorithm. The experiments show that the recognition rates obtained using word trigram models are the same as those obtained with word bigram models for text-open data. However, better results are obtained for text-closed data.