In section 4.3, we described a procedure to counter in speech data. This technique can also be used for the procedure for dealing with filled-pauses in spontaneous speech. We therefore propose two techniques for this procedure.
Filled-pauses are already known as phone strings (e.g. ``well'' is /W/ /EH/ /L/ ), and recorded in a word dictionary. And this method skips filled-pauses like a . Additionally, we use the unigram probabilities of filled-pauses as the penalty.
For example, let's assume that the speech data is as follows.
`` Do you already have well a registration form ? ''
In this case, ``well'' is a filled-pause. The trigram probabilities are calculated as follows.
``do'' ``you''``do''``already''``do''``you''``well''``have''``you''``already''``a''``already''``have''``registration''``have''``a''
In this equation, ``well'' is the penalty and ``have''``you''``already'' means to skip filled-pauses.
As filled-pauses are considered to be sequences of phones, this technique skips phones strings as filled-pauses, we also use a penalty for the phone trigram probabilities. This technique is also used for the procedure for dealing with hesitations, retractions and out-of-vocabulary words .
For example, let's assume the speech data is as follows.
The trigram probabilities are calculated as follows.
``do'' ``you''``do''``already''``do''``you'' ``have''``you''``already''``a''``already''``have''``registration''``have''``a''
In this equation, , and mean the penalty and ``have''``you''``already'' means to skip phone strings.