next up previous
次へ: Experiments for spontaneous speech 上へ: Spontaneous Speech Recognition 戻る: Spontaneous Speech Recognition

Filled-pauses procedure


In section 4.3, we described a procedure to counter $/pause/$ in speech data. This technique can also be used for the procedure for dealing with filled-pauses in spontaneous speech. We therefore propose two techniques for this procedure.

  1. Filled-pauses Skip

    Filled-pauses are already known as phone strings (e.g. ``well'' is /W/ /EH/ /L/ ), and recorded in a word dictionary. And this method skips filled-pauses like a $/pause/$. Additionally, we use the unigram probabilities of filled-pauses as the penalty.

    For example, let's assume that the speech data is as follows.

    `` Do you already have well a registration form ? ''

    In this case, ``well'' is a filled-pause. The trigram probabilities are calculated as follows.

    $
P( $``do'' $ ) \times P ( $``you''$ \vert $``do''$ ) \times P ( $``already''$ \vert $``do''$,$``you''$ ) \times P ( $``well''$ ) \times P ( $``have''$ \vert $``you''$,$``already''$ ) \times P ( $``a''$ \vert $``already''$,$``have''$ ) \times P ( $``registration''$ \vert $``have''$,$``a''$ ).
$

    In this equation, $
P( $``well''$) $ is the penalty and $
P( $``have''$ \vert $``you''$,$``already''$) $ means to skip filled-pauses.

  2. Phone-strings Skip

    As filled-pauses are considered to be sequences of phones, this technique skips phones strings as filled-pauses, we also use a penalty for the phone trigram probabilities. This technique is also used for the procedure for dealing with hesitations, retractions and out-of-vocabulary words .

    For example, let's assume the speech data is as follows.

    $ \lq\lq Do\ you\ already\ have\ well\ a\ registration\ form ?''$ $
( /D/UW/\ \ /Y/UW/\ \ /AO/L/R/EH/D/IY/\ \ /H/AE/V/\ \ \\
/W/EH/L/\ \ /AH/\ \ /R/EH/J/IH/S/T/R/EY/SH/UN/\ \ \\
/F/OH/M?)
$

    The trigram probabilities are calculated as follows.

    $
P( $``do'' $ ) \times P ( $``you''$ \vert $``do''$ ) \times P ( $``already''$ \vert $``do''$,$``you'' $ ) \times
P( /W/ \vert /E/,/V/ ) \times
P( /EH/ \vert /V/,/W/ ) \times
P( /L/ \vert /W/,/EH/ ) \times
P( $``have''$ \vert $``you''$,$``already''$ ) \times P ( $``a''$ \vert $``already''$,$``have''$ ) \times P ( $``registration''$ \vert $``have''$,$``a''$ ).
$

    In this equation, $ P( /W/ \vert /E/,/V/ ) $ , $ P( /EH/ \vert /V/,/W/ ) $ and $ P( /L/ \vert /W/,/EH/ ) $ mean the penalty and $
P( $``have''$ \vert $``you''$,$``already''$) $ means to skip phone strings.



next up previous
次へ: Experiments for spontaneous speech 上へ: Spontaneous Speech Recognition 戻る: Spontaneous Speech Recognition
Jin'ichi Murakami 平成13年1月19日