next up previous
次へ: Experimental results 上へ: Speaker Classification Problem 戻る: Initial HMM parameters

Evaluation Method of Speaker Classification Rate


Even though the optimal state sequence is obtained using forward decoding or Viterbi decoding, the relation between the classification and the speaker is still unknown. Therefore, we calculated the classification rate by the following expression.


\begin{displaymath}
R = \frac{1}{T} \max_{\sigma} \sum_{t=1}^T
d (\tau(\mbox{\boldmath$x$}_t),\sigma(S_t))
\end{displaymath} (1)

In the above, $\tau$ is the optimal state sequence. $\sigma$ is an arbitrary permutation of $(1,2,\ldots, N)$ , and $S_t$ is the correct speaker of the utterance. $d$ is the variable that takes value 1 if the values agree, and 0 if otherwise.

In this study, the classification numbers are related to $\sigma$, the correct classification rate is calculated for each of $N!$ permutations, and the maximum is defined as the classification rate. Consequently, in the case of 4 speakers, $24(=4!)$ combinations are examined.




Jin'ichi Murakami 平成13年1月19日