HMMを利用した言語獲得の可能性について Investigation of language model using HMM

In this paper, we investigate language models using HMM (Hidden Markov Model: doublly embedded stochastic process with an underlying stochastic process that isn't observable).

There are dichotomous types of natural language modeling. One is the class of deterministic models, exploiting some known specific properties of the language, and the other is the class of statistical models in which one tries to characterize the statistical properties of the corpus.

These statistical models include stochastic context free grammar and Markov process, a sort of non-deterministic finite state automaton. An HMM effectively exploits language models as a random process. By choozing specific parameters of HMM (i.e. the number of states in the model), grammatical rule can be estimated in a well-defined manner as a transition network. HMM is very rich in mathmatical structure so that language models are determined more precisely than that of stochastic context free grammar or Markov process.

This paper includes the results obtained to characterize inter-clause grammar automatically from 25 syntactic categories using ergodic HMM on a 30000-clause corpus of syntactic category sequences. The resultant model indicates that some common subnetwork exists even though the process is carried out automatically.

田本真詞	村上仁一	嵯峨山茂樹
TAMOTO masafumi	MURAKAMI jin-ichi	SAGAYAMA shigeki

HMMを利用した言語獲得の可能性について Investigation of language model using HMM

概要:

HMMを利用した言語獲得の可能性について
Investigation of language model using HMM