We developed a word HMM to connect phone HMMs. To train continuous mixture HMMs, 2,635 word utterances were used. For the training of bigram probabilities, 8,475 sentences of the ATR Dialog Database were used in text-open experiments and smoothed by the deleted-interpolation algorithm [4]. Then the same 8,475 sentences plus 38 test sentences were used in text-closed experiments. In this way, it was possible to perform both the text-open and text-closed experiments using the same test data. To reduce the memory requirements, we used beam pruning. The results of these experiments were evaluated in terms of the word correct rate and word accuracy rate[3]. The experimental conditions are summarized in Table 1.
phone model | 3-state 4-loop |
continuous mixture HMM | |
mixture number | max 10 ( valid for each syllable ) |
acoustic parameter | 16th order LPC cepstrum + power |
+ power + 16th order cepstrum | |
frame window | 20 ms |
frame period | 5 ms |
training voice | word speech (2,620 words) |
# syllable categories | 26 syllables |
language model | word bigram |
training data | 8,475 sentences (57,354 words) |
vocabulary | 435 |
beam width | 4,096 |
test sentence count | 38 sentences |