next up previous
次へ: Automatic Stochastic Network Language 上へ: Reducing Memory Requirements and 戻る: Reducing Memory Requirements and

Introduction

There are many ways of modeling natural language for speech recognition. Among these models, a network language model is often used because of its simplicity. However, this model normally has a high perplexity which decreases the speech recognition performance. Therefore, stochastic network language model (SNLMs) which can add probability to the network language model have been studied to resolve this problem.

On the other hand, Hidden Markov Models (HMMs) are popular for acoustic modeling in speech recognition[1]. One of the advantages of HMMs is that they can be automatically trained through the Baum-Welch maximum likelihood estimation procedure using training speech data. Among the various types of HMMs, the all-state-connected model is called an Ergodic HMM.

This Ergodic HMM has a structure similar to that of an SNLM [2]. Therefore, it can function as an SNLM automatically through the Baum-Welch algorithm. The resulting state transition probabilities of the Ergodic HMM are interpreted as transition probabilities in the SNLM and output probabilities of the HMM are interpreted as word output probabilities in the SNLM.

However, when the number of states in the Ergodic HMM is large, a large amount of memory is required and the computational cost is high. Therefore, past studies have limited the number of states. Because of this, the perplexity has been high and results as good as those for word bigram models have not been obtained [3],[4].

This paper proposes new techniques which reduce the memory requirements and computational costs significantly when the Baum-Welch algorithm is used. These techniques were evaluated for their ability to automatically give an SNLM for an international conference registration task. The Ergodic HMM was found to have a low perplexity compared to that of word bigram models. Furthermore, continuous speech recognition was performed. The results showed that the Ergodic HMM can outperform word bigram models for text-closed data and can outperform word trigram models for text-open data. These results indicate that the proposed techniques are efficient.


next up previous
次へ: Automatic Stochastic Network Language 上へ: Reducing Memory Requirements and 戻る: Reducing Memory Requirements and
Jin'ichi Murakami 平成13年1月19日