10個未満 | 20個未満 | 30個未満 | |
混合HMM | 混合HMM | 混合HMM | |
mau | 90.69% | 91.45% | 91.45% |
(2376/2620) | (1396/2620) | (2396/2620) | |
mmy | 89.54% | 90.95% | 90.95% |
(2346/2620) | (2383/2620) | (2383/2620) | |
mnm | 88.97% | 89.20% | 89.20% |
(2331/2620) | (2337/2620) | (2337/2620) | |
faf | 91.30% | 91.95% | 91.95% |
(2392/2620) | (2409/2620) | (2409/2620) | |
fms | 89.73% | 90.61% | 90.61% |
(2351/2620) | (2374/2620) | (2374/2620) | |
ftk | 89.85% | 92.79% | 92.79% |
(2354/2620) | (2431/2620) | (2431/2620) | |
平均 | 90.01% | 91.16% | 91.16% |
(14150/15720) | (14330/15720) | (14330/15720) |
10個未満 | 20個未満 | 30個未満 | |
混合HMM | 混合HMM | 混合HMM | |
mau | 85.04% | 88.74% | 89.39% |
(2228/2620) | (2325/2620) | (2342/2620) | |
mmy | 86.41% | 87.37% | 88.36% |
(2264/2620) | (2289/2620) | (2315/2620) | |
mnm | 84.69% | 85.73% | 88.13% |
(2219/2620) | (2246/2620) | (2309/2620) | |
faf | 87.86% | 89.05% | 89.16% |
(2302/2620) | (2333/2620) | (2336/2620) | |
fms | 85.53% | 87.14% | 89.24% |
(2241/2620) | (2283/2620) | (2338/2620) | |
ftk | 85.69% | 90.46% | 91.26% |
(2245/2620) | (2370/2620) | (2391/2620) | |
平均 | 85.87% | 88.08% | 89.26% |
(13499/15720) | (13846/15720) | (14031/15720) |
不特定話者HMM,話者適応HMM,混合HMMを用いた場合の,単語音声認識の6話者の 平均誤り率を示す. 164単語の学習データの結果を図8に,82単語の学習データの結 果を図9に示す.
実験より以下の結果を得た.
82単語の話者適応の認識精度が大きく低下していることと,30個未満混合HMMが 最も認識精度が高く,20個未満,30個未満となるにつれて認識精度が低下してい ることから,話者適応において,音素数が少ない音素を多く含む学習データほど, 認識精度が低下するといえる.