main1

次へ: まえがき

単語のtrigramを利用した文音声認識と自由発話認識への拡張

村上仁一松永昭一

ATR音声翻訳通信研究所

〒619-02 京都府相楽郡精華町光台2-2

あらまし

人間同士のコミュニケーションでは、「あのー」、「えーと」などに代表される冗長語や、言い淀みや言い誤りおよび言い直しなどが頻繁に出現する。このような音声でも認識できる、いわゆる自由発話の音声認識が、今後の重要な研究課題になると思われる。本論文では、初めに単語trigramとViterbiサーチ(one-pass DP)を基本とする文音声認識アルゴリズムを述べる。次にメモリ量や計算量を削減したアルゴリズムについて述べる。この改善は、ビームサーチやViterbiの経路計算の改良などをすることにより得られた。この改良により、このアルゴリズムでは、各時刻・各状態において最尤の単語列を知ることができる。この特徴を生かして、音響モデルでは冗長語を認識しながら言語モデルでは冗長語をスキップすることにより、冗長語を含んだ音声を認識することができる。最後に自由発話の認識を行なった。その結果、このアルゴリズムの有効性が示された。

キーワード $\bullet$ ２重連鎖確率モデル $\bullet$ one-pass DP $\bullet$ 自由発話認識 $\bullet$ 冗長語

A Spontaneous Speech Recognition Algorithm with Pause and Filled Pause Procedure

Jin'ichi Murakami Shouichi Matunaga

ATR Interpreting Telecommunications Research Laboratories

2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-02

Abstract

This paper describes an effective recognition algorithm using word trigram models directly and a procedure of filled pauses in spontaneous speech. This recognition algorithm greatly reduces the memory requirements and computational costs by employing two techniques: beam search and an improved Viterbi search. With these methods, we can perform execution in a 15M byte space for about a 1500-word vocabulary. Next, focusing on spontaneous speech recognition, we look at a filled pause procedure to handle the many filled pauses in spontaneous speech. Even though the proposed method employs a simple procedure, we obtain a 64.4% sentence recognition rate for semi spontaneous speech and a 34.4% for spontaneous speech.

key words $\bullet$ Word trigram model $\bullet$ one-pass DP $\bullet$ spontaneous speech recognition $\bullet$ filled pauses

論文をps形式でダウンロードする (約1Mbyte)

次へ: まえがき

Jin'ichi Murakami 平成13年10月4日