next up previous
次へ: CONCLUSION 上へ: AN EFFICIENT ALGORITHM FOR 戻る: The method determining the

EXPERIMENTAL RESULTS OF CONTINUOUS SPEECH RECOGNITION USING WORD TRIGRAM MODELS

In this section, we describe the experimental results obtained using this algorithm. This experiment is a speaker dependent continuous speech recognition and the test sentences are spoken by an broadcast announcer. The flooring probability was set to $e^{-1000}$. The experimental conditions are shown in Table 2.


表 2: Experimental conditions of experience for continuous speech recognition using word trigram models
algorithm continuous mixture HMM + beam search
  + word trigram models
mixture number max 14 ( valid for each syllable )
state number 3-state 4-loop left-to-right model
acoustic parameter 16th order LPC cepstrum + power
  +delta power + 16th order delta cepstrum
frame window 20 ms
frame period 5 ms
training voice word speech (5,240 words)
syllable category count 52 syllables
vocabulary 1,567
beam width 4,096
duration control no
language information word trigram models
unit of recognition sentence
test sentence count 38 sentences
speaking style read speech
speech content international conference task (model conversation)

The probabilities of the word trigram models for language information are calculated as follows.

  1. Test data is not included in the training data which is calculated word trigram probabilities.
    ( text-open data )

  2. Test data is included in the training data which is calculated word trigram probabilities.
    ( test-closed data )

The training data includes about 15,000 sentences with 190,000 words of the ATR Dialog Database. Table 3 shows the task entropy. The experimental results are shown in Figure 1. Results from word bigram models are shown together for comparison. We obtained a 78% sentence recognition rate for text-closed data and 40% for test-open data. Figure 4 shows erroneous output results for the text-closed data. These results show that many sentences are semantically correct. Only four sentences are completely wrong. The sentence recognition rate is 89%. These completely wrong sentences include the pause in speech data.

We think that the pause causes the error because acoustic parameters and word trigram models do not correspond. However, the recognition rate for text-open data is the same for word bigram models and word trigram models. This is due to the small flooring probability value and the small amount of text data. Yet, in statistical language models like word trigram models, the recognition rate for text-open data depends on the coverage between the training data and test data. Therefore, we believe that the reliability of the recognition rate for the text-open data is very low.


表 3: Task perplexity
ngram Entropy Perplexity  
unigram 8.33 321.8  
bigram 3.77 13.6  
trigram 2.01 4.0  

図 1: Experimental Result
\begin{figure}\begin{center}
\fbox{\epsfig{file=figure.ps,width=120mm}}\end{center}\end{figure}


表 4: Sentence Error in the Experiment (Text-Closed, Training Data 15713)
correct → output
kaiginoshukuhakushisetsunitsuiteotazuneshitainodesuga
→ kaiginoshukuhakushisetsunitsuiteotazuneshitaiNdesuga
会議の宿泊施設についてお尋ねしたいのですが
→ 会議の宿泊施設についてお尋ねしたいですが
kyoutopuriNsuhoterugakaigizyounihachikainodesuga
→ kyoutopuriNsuhoterugakaigizyounihachikaiNdesuga
京都プリンスホテルが会議場には近いんですが
→ 京都プリンスホテルが会場には近いですが
soredehakyoutopuriNsuhoteruoyoyakushitainodesuga
→ soredehakyoutopuriNsuhoteruoyoyakushitaiNdesuga
それでは京都プリンスホテルを予約したいのですが
→ それでは京都プリンスホテルを予約したいですが
hoterunotehaimoshiteitadakerunodesuka
→ hoterunotehaimoshiteitadakeruNdesuka
ホテルの手配もしていただけるのですか
→ ホテルの手配もしていただけるですか
dehaonamaetogozyuushooonegaishimasu
gohaQpyouninarukatanogozyuushooonegaishimasu
ではお名前とご住所をお願いします
ご発表になる方のご住所をお願いします
zyuushohatoukyoutominatokushiNbashiiQchoumeichibaNchisaNgoudesu
→ zyuushohanechoQtokyoutonokaiginihatourokunikaNshimashitekyounoseQshoNnoichibaNsaNgoudesu
住所は東京都港区新橋1丁目1番3号です
→ 住所はねちょっと京都の会議には登録に関しまして今日のセッションの1番3号です
deNwabaNgoumoonegaishimasu
→ deNwabaNgouonegaishimasu
電話番号もお願いします
→ 電話番号お願いします
deNwabaNgouhasaNsaNichinoniigoniiichidesu
→ roNbuNnohaQpyouhagozeNchuunokuzinikaizyounichikaidesu
電話番号は331の2521です
論文の発表は午前中の9時に会場に近いです
kyoutopuriNsuhoterunihachigatsuyoQkakarayoukamadehitoribeyaootorishimashita
→ kyoutopuriNsuhoterunihachigatsuyoQkakarayoukamadenihahaQpyoushaootorishimashita
京都プリンスホテルに8月4日から8日まで一人部屋をお取りしました
→ 京都プリンスホテルに8月4日から8日までには発 表者にお送りしました。


next up previous
次へ: CONCLUSION 上へ: AN EFFICIENT ALGORITHM FOR 戻る: The method determining the
Jin'ichi Murakami 平成13年2月19日