Abstract
This paper describes new techniques for language modeling in speech recognition based on the use of a discrete density Ergodic Hidden Markov Model (HMM).
A discrete-output Ergodic HMM has a structure similar to that of a stochastic network language model (SNLM), so it can automatically function as an SNLM from a large amount of text data through the Baum-Welch algorithm. However, when the number of states in this Ergodic HMM is large, a large amount of memory is required and the computational cost is high. Therefore, past studies have limited the number of states. Consequently, the resulting perplexity of the Ergodic HMM has been high, and results as good as those obtained for word bigram models have not been obtained.
This paper proposes new techniques to reduce the memory requirements and computational costs associated with the Baum-Welch algorithm. These techniques were evaluated for their ability to automatically give an SNLM for an international conference registration task. Based on both the perplexity obtained and the results of continuous speech recognition, this Ergodic HMM was found to outperform word bigram models or trigram models. This implies that the proposed techniques are effective.
key words stochastic network model Baum-Welch algorithm continuous speech recognition language model perplexity Ergodic HMM
あらまし
全状態間の遷移が許されている(Ergodic)離散型HMMにおいて単語を出力シンボ ルとした場合、その構造はネットワーク文法記述と形式的に類似する。したがっ て大量の単語列データから、Baum-Welchの学習アルゴリズムを使用して、確率 つきネットワーク文法を自動的に獲得できる可能性がある。し かし状態数を大きくするとメモリ量および計算量は増加するため、現実的に計 算が不可能になる。そのため従来の研究では状態数が少なく、認識性能や perplexityは単語のbigramと比較して良くない。そこで本稿では状態数が多い Ergodic HMMを学習するために、メモリ量および計算量を削減したBaum-Welch アルゴリズムを提案する。さらに、得られた Ergodic HMMを言語モデルとして 連続音声認識に用いた実験結果についても述べる。
キーワード 確率付きネットワーク文法 Baum-Welch algorithm 連続音声認識 言語モデル perplexity Ergodic HMM