For training of a continuous mixture HMM, 2635 word utterances were used. The number of states for the Ergodic HMM as a language model was set to 2, 4, and 8. In addition, the sentence likelihood is the product of the acoustic model likelihood and the with the power of the language model likelihood , in our experiments we used .
We used 38 sentences as test data. For training the Ergodic HMM, in the text-open experiment we used 4000 sentences of the ATR Dialog Database, whereas in the text-closed experiment we used the same 4000 sentences plus the 38 test sentences. In this way, we can perform the text-closed experiment and a text-open experiment in the same test data.
The experimental conditions are shown in more detail in Table 3 .
#syllable models | 52 |
Syllable model | 4-state 3-loop Gaussian mixture |
continuous HMM | |
Learning data | male announcer, 2635 word utterances |
Parameter | log power + 16 order LPC-cepstrum |
+ log power + 16 order cepstrum | |
Test data | same speaker, 38 sentences |
Vocabulary | 435 |
Beam width | 1024 |
Speech style | read speech |
#state of | 2,4,8 |
Ergodic HMM | |
learning data | 4000 sentences, 57354 words |