main2

次へ: まえがき

フレーム同期型フルサーチアルゴリズムを用いた連続音声認識と自由発話への応用

村上仁一

NTT情報通信研究所
（ATR 音声翻訳通信研究所）

神奈川県横須賀市武1-2356

あらまし

人間同士のコミュニケーションでは、「あのー」「えーと」などに代表される間投詞や、言い淀みや言い誤りおよび言い直しなどが頻繁に出現する。このような音声でも認識できる、いわゆる自由発話の音声認識が、今後の重要な研究課題になると思われる。しかし、間投詞や言い直しは、文の全ての場所に出現する可能性がある。また、自由発話様式において認識精度の高い音響モデルを作成することは困難であると考えられる。そこで本論文ではperplexityの低い言語モデルに着目した。そして単語のtrigramモデルを基本の言語モデルとして、間投詞や言い直しをgarbage modelや音素モデルでスキップすることで自由発話の認識を試みた。

この論文では、初めにフレーム同期型のフルサーチアルゴリズム（全探索）を用いた連続音声認識システムについて述べる。次に計算量およびメモリ量を削減するための改良点について述べる。次に自由発話の認識のためのアルゴリズムの改良点を述べる。最後に、このアルゴリズムを用いた自由発話の認識実験結果について述べる。

キーワード $\bullet$ 自由発話認識 $\bullet$ 全探索 $\bullet$ ビームサーチ $\bullet$ 時間同期 $\bullet$ ガーベージモデル $\bullet$ 単語 trigram model

Frame Synchronous Full Search Algorithm and Applied for Spontaneous Speech Recognition

村上仁一

NTT Information and Communication System Laboratories
(ATR Interpreting Telecommunications Research Laboratories)

1-2356 Take Yokosuka-shi Kanagawa 238-03 Japan

abstract

This paper describes an spontanous speech recognition algorithm based on word trigram models.

This recognition algorithm greatly reduces the memory requirements and computational costs by employing beam search. With these methods, we can perform execution in a 15M byte space for about a 500-word vocabulary. Next, focusing on spontaneous speech recognition, we look at a skip phone procedure to handle the many filled pauses and false starts in spontaneous speech. Even though the proposed method employs a simple procedure, we obtain a 47.7% sentence recognition rate for spontaneous speech. Including the semantically correct sentences, the sentence recognition rate is about 75%.

key words $\bullet$ Spontaneous Speech Recognition $\bullet$ Full Search $\bullet$ Beam Search $\bullet$ Frame Synchronous $\bullet$ Garbage Model $\bullet$ Word trigram Model

論文をps形式でダウンロードする (約1Mbyte)

次へ: まえがき

Jin'ichi Murakami 平成13年10月2日