next up previous
次へ: はじめに

HMMを利用した言語獲得の可能性について
Investigation of language model using HMM

田本真詞 村上仁一 嵯峨山茂樹
TAMOTO masafumi MURAKAMI jin-ichi SAGAYAMA shigeki

東京工業大学工学部
Tokyo Institute of Technology
ATR 自動翻訳電話研究所
ATR Interpretating Telephony Research Laboratories

概要:

In this paper, we investigate language models using HMM (Hidden Markov Model: doublly embedded stochastic process with an underlying stochastic process that isn't observable).

There are dichotomous types of natural language modeling. One is the class of deterministic models, exploiting some known specific properties of the language, and the other is the class of statistical models in which one tries to characterize the statistical properties of the corpus.

These statistical models include stochastic context free grammar and Markov process, a sort of non-deterministic finite state automaton. An HMM effectively exploits language models as a random process. By choozing specific parameters of HMM (i.e. the number of states in the model), grammatical rule can be estimated in a well-defined manner as a transition network. HMM is very rich in mathmatical structure so that language models are determined more precisely than that of stochastic context free grammar or Markov process.

This paper includes the results obtained to characterize inter-clause grammar automatically from 25 syntactic categories using ergodic HMM on a 30000-clause corpus of syntactic category sequences. The resultant model indicates that some common subnetwork exists even though the process is carried out automatically.



論文をps形式でダウンロードする (約1Mbyte)


next up previous
次へ: はじめに
Jin'ichi Murakami 平成13年10月5日