For the training of the continuous mixture HMM, 2,635 word utterances were used. For the training of the Ergodic HMM, 8,475 sentences of the ATR Dialog Database were used in text-open experiments, whereas the same 8,475 sentences plus 38 test sentences were used in text-closed experiments. In this way, it was possible to perform both the text-closed and text-open experiments using the same test data. The experimental conditions are summarized in Table 2
algorithm | continuous mixture HMM |
+ beam search + Ergodic HMM | |
mixture count | max 14 ( valid for each syllable ) |
state number | 3-state 4-loop left-to-right model |
acoustic parameter | 16th order LPC cepstrum + power |
+ power + 16th order cepstrum | |
frame window | 20 ms |
frame period | 5 ms |
training voice | word speech (2,635 words) |
phone category | 52 syllables |
vocabulary | 435 |
beam width | 4,096 |
duration control | no |
test sentence count | 261 sentences; same speaker |
speaking style | read speech |
speech content | international conference task |