Concatenative Synthesis

Next: Unit for Concatenation Up: 2. Overview of the Previous: 2. Overview of the

Concatenative Synthesis

Our aim is to produce a response-type dialogue with a single narrator by using synthesized words to fill the slots of an appropriate template recorded by the same narrator; the number of words handled might be extremely large in a complete real commercial service. To synthesize voices of such words while retaining the characteristics of the narrator, using his/her recorded voice with no signal processing seems to be the best way. The synthesis method proposed in this paper basically concatenates syllabic components with minimal signal processing to realize the concept described above. It is obvious that simply concatenating syllables pronounced separately will produce a voice of poor quality. To avoid this, our method collects syllabic components by dividing the recorded words in syllabic waveforms. It is well known that each component should have the acoustic features suitable for the context. In our method, each syllabic component has information of prior and posterior phonemes and only those that meet the required phoneme context are chosen. This yields smooth spectra at concatenation points. Smoothness in terms of pitch and power is also needed, and this is elaborated in Sec.3. To achieve natural continuity of not only spectra but also pitch and power with minimum signal processing, our improvement focus on the collection of syllabic components.

Next: Unit for Concatenation Up: 2. Overview of the Previous: 2. Overview of the

Jin'ichi Murakami
2000-01-17