next up previous
次へ: まえがき

[

学習データ量に対するマルコフ連鎖確率値の収束性と 単語のHMMとBigramを用いた文節音声認識システムについて



A Study of Convergence Properties of N-gram Language Models for Japanese and A Phrase Recognition System using the Word Bigram Model applied to HMM Word Models



村上仁一 坪井俊明

Jin'ichi Murakami Toshiaki Tuboi

ATR音声翻訳通信研究所 NTTヒューマンインターフェース研究所

ATR Interpreting Telephony Research Labs. NTT Human Interface Labs.

概要:

This paper describes the frequency of occurrence of n-grams of syllables, kanji-kana, part-of-speech, and word in Japanese newspaper text and X-ray CT scanning reports. An algorithm for continuous speech sentence recognition using word HMMs and a word bigram model is also discussed.

It is well known that the word bigram or word trigram models is effective tools for a speech recognition system. However, the convergence properties for n-gram probabilities have not been reported for Japanese.

In this paper, we firstly report the convergence of unigram, bigram, trigram and 4-gram language models for syllable, kanji-kana, word and part-of-speech units extracted from newspaper text and X-ray CT scanning reports.

Secondly, we report a phrase recognition algorithm using the word bigram model and word HMMs for X-ray CT scanning reports. The usual high amount of training data for word HMMs could be reduced to a single utterance using a technique known as fuzzy-vector-quantization.

A sentence recognition experiment using this algorithm was carried out to test the efficiency of the word bigram model with a vocabulary of about 3000 words. The following phrase recognition rates were obtained in this experiment: 96.8% for text-closed data and normal findings, 78.1% for text-closed data and abnormal findings, 86,5% for text-open data and normal findings, 72.1% for text-open data and abnormal findings. (The term "abnormal findings" refers to the content of the X-ray CT scanning reports.)




]






論文をps形式でダウンロードする (約1Mbyte)


next up previous
次へ: まえがき
Jin'ichi Murakami 平成13年10月5日