Next: Unit for Concatenation
Up: 2. Overview of the
Previous: 2. Overview of the
Our aim is to produce a response-type dialogue
with a single narrator
by using synthesized words to fill the slots of an appropriate template
recorded by the same narrator;
the number of words handled might be extremely large
in a complete real commercial service.
To synthesize voices of such words while
retaining the characteristics of the narrator,
using his/her recorded voice with no signal processing
seems to be the best way.
The synthesis method proposed in this paper
basically concatenates syllabic components
with minimal signal processing
to realize the concept described above.
It is obvious that simply concatenating syllables
pronounced separately
will produce a voice of poor quality.
To avoid this,
our method collects syllabic components by
dividing the recorded words in syllabic waveforms.
It is well known that each component
should have the acoustic features
suitable for the context.
In our method, each syllabic component has
information of prior and posterior phonemes
and only those that meet the required phoneme context
are chosen.
This yields smooth spectra at concatenation points.
Smoothness in terms of pitch and power is also needed, and this
is elaborated in Sec.3.
To achieve natural continuity of
not only spectra but also pitch and power
with minimum signal processing,
our improvement focus on the collection
of syllabic components.
Next: Unit for Concatenation
Up: 2. Overview of the
Previous: 2. Overview of the
Jin'ichi Murakami
2000-01-17