This paper described the problem of identifying multiple speaker utterances, as an example of the unknown-multiple signal source clustering problem. However many unsolved problems remain as described below.
It is necessary to develop a method for evaluating the speaker classification rate when the number of speakers is large. This paper examined all possibilities, and the highest value was taken as the classification rate. However, the number of possible combinations is the factorial of the number of speakers. Thus, it is necessary to speed up the evaluation.
Speaker transition occurred in some frames because of the LPC analysis. In such a frame, the speaker can not be determined uniquely. In other words, the time resolution of the speaker classification depends on the LPC frame window length. This problem must be studied and resolved.
In this experiment, the number of speakers (the number of categories) was set as 4. This means that the number of speakers is a priori knowledge. A technique is needed that can estimate the number of speakers.