Filled-pauses procedure

次へ: Experiments for spontaneous speech 上へ: Spontaneous Speech Recognition 戻る: Spontaneous Speech Recognition

Filled-pauses procedure

In section 4.3, we described a procedure to counter in speech data. This technique can also be used for the procedure for dealing with filled-pauses in spontaneous speech. We therefore propose two techniques for this procedure.

Filled-pauses Skip
Filled-pauses are already known as phone strings (e.g. ``well'' is /W/ /EH/ /L/ ), and recorded in a word dictionary. And this method skips filled-pauses like a . Additionally, we use the unigram probabilities of filled-pauses as the penalty.
For example, let's assume that the speech data is as follows.
`` Do you already have well a registration form ? ''
In this case, ``well'' is a filled-pause. The trigram probabilities are calculated as follows.
``do'' $) \times P ($ ``you'' $\vert$ ``do'' $) \times P ($ ``already'' $\vert$ ``do''``you'' $) \times P ($ ``well'' $) \times P ($ ``have'' $\vert$ ``you''``already'' $) \times P ($ ``a'' $\vert$ ``already''``have'' $) \times P ($ ``registration'' $\vert$ ``have''``a''
In this equation, ``well'' is the penalty and ``have'' $\vert$ ``you''``already'' means to skip filled-pauses.
Phone-strings Skip
As filled-pauses are considered to be sequences of phones, this technique skips phones strings as filled-pauses, we also use a penalty for the phone trigram probabilities. This technique is also used for the procedure for dealing with hesitations, retractions and out-of-vocabulary words .
For example, let's assume the speech data is as follows.
$\lq\lq Do\ you\ already\ have\ well\ a\ registration\ form ?''$ $( /D/UW/\ \ /Y/UW/\ \ /AO/L/R/EH/D/IY/\ \ /H/AE/V/\ \ \\ /W/EH/L/\ \ /AH/\ \ /R/EH/J/IH/S/T/R/EY/SH/UN/\ \ \\ /F/OH/M?)$
The trigram probabilities are calculated as follows.
``do'' $) \times P ($ ``you'' $\vert$ ``do'' $) \times P ($ ``already'' $\vert$ ``do''``you'' $) \times P( /W/ \vert /E/,/V/ ) \times P( /EH/ \vert /V/,/W/ ) \times P( /L/ \vert /W/,/EH/ ) \times P($ ``have'' $\vert$ ``you''``already'' $) \times P ($ ``a'' $\vert$ ``already''``have'' $) \times P ($ ``registration'' $\vert$ ``have''``a''
In this equation, $P( /W/ \vert /E/,/V/ )$ , $P( /EH/ \vert /V/,/W/ )$ and $P( /L/ \vert /W/,/EH/ )$ mean the penalty and ``have'' $\vert$ ``you''``already'' means to skip phone strings.

次へ: Experiments for spontaneous speech 上へ: Spontaneous Speech Recognition 戻る: Spontaneous Speech Recognition

Jin'ichi Murakami 平成13年1月19日