next up previous
次へ: Stochastic Network Grammar Acquisition 上へ: Automatic Stochastic Network Language 戻る: Automatic Stochastic Network Language

Reducing memory requirements and computational costs

If a large-state Ergodic HMM is to be calculated, a large amount of memory is required. Therefore, we propose new techniques which reduce the memory requirements and computational costs significantly when the Baum-Welch algorithm is to be used. These techniques are divided into two main parts.

Technique 1
Set small probabilities to 0.

If $b_N(i,j,w) $ is less than a given small value, this probability is set to 0.0 and there are no calculation nor memory needs for the Baum-Welch algorithm.

To put it concretely, the forward value $ \alpha_t(j) $ must be calculated as follows[1] with the basic Baum-Welch algorithm.

\begin{displaymath}\alpha_t(j)= [ \sum_{i} \alpha_{t-1}(i) a_N(i,j)] b_N(i,j,O_t) \end{displaymath}

Instead of this equation, the following equations are used:

\begin{displaymath}\alpha_t(j)=[ \sum_{i} \alpha_{t-1}(i) a_N(i,j)] b_N(i,j,O_t) \hspace{1mm} \end{displaymath}



\begin{displaymath}\hspace{3.5cm} b_N(i,j,Ot) > C \end{displaymath}



\begin{displaymath}\hspace{2mm} \alpha_t(j)=0.0 \hspace{1.9cm} b_N(i,j,Ot) \leq C \end{displaymath}

In experiments, we set $ C $ to $ 10.0 ^ { -300 }$.

The output probability $b_N(i,j,w) $ has the largest memory requirements among the parameters in the Ergodic HMM. So, in using this algorithm, the memory requirements are decreased for word output probabilities. The computational costs are also decreased at the same ratio.

Technique 2
Increase the number of states successively.

To calculate a large-state Ergodic HMM, a large amount of memory is required. So, we start with a 1-state Ergodic HMM and increase the number of states successively.

If the parameters of the $N$-state Ergodic HMM have already been estimated, the initial parameters of a $2N$-state Ergodic HMM are constructed as follows;

$ \pi_{2N} (i) = 0.5 \times \pi_N (i/2); \ \ i=1,..,2N $.
$ a_{2N} (i,j) = 0.5 \times a_N (i/2,j/2); \ \ i=1,..,2N; j=1,..,2N $.
The symbol $/$ indicated that the integer has been rounded-up.

The output probability B is a little different, as random weights are used.

$ b_{2N} (i,j,w) = b_N(i/2,j/2,w) \times random(i,j,w) $
$ i=1,..,2N; j=1,..,2N; w=1,..,V$.
$b_{2N}$ is normalized as $ \sum_w b_{2N}(i,j,w)=1.0 $.

The training of this large-state Ergodic HMM is carried out as follows.

  1. First, a 1-state Ergodic HMM is constructed. In this model, the initial state probability and the state transition probability are set to 1.0 and the word output probability is set to a random value.

  2. Second, Ergodic HMM parameters are re-estimated using the proposed Baum-Welch algorithm (Technique 1).

  3. Third, the number of states in the Ergodic HMM is increased using Technique 2.

  4. Fourth, the Ergodic HMM parameters are re-estimated using the proposed Baum-Welch algorithm (Technique 1).

These sequence are repeated.

A similar technique where small probabilities are set to 0.0 is described in [5]. As such, this study's focus is mainly to reduce the computational costs and not to reduce the memory requirements. Also, as before, the number of states is kept small and there are no outstanding results.


next up previous
次へ: Stochastic Network Grammar Acquisition 上へ: Automatic Stochastic Network Language 戻る: Automatic Stochastic Network Language
Jin'ichi Murakami 平成13年1月19日