Automatic Evaluations

Next: Human evaluation Up: Results of our Machine Previous: Examples of output sentences

Automatic Evaluations

We used 10,000 test sentences in this experiment. Among these 10,000 sentences, 1,143 sentences matched the Japanese-English patterns. The results of ``English''-English translation revealed that 725 out of the 1,143 sentences were different compared to the standard SMT system (Moses). The other 8,857 sentences (10,000 - 1,143) did not match the Japanese-English pattern.

We used the BLEU [Papineni
$\bgroupet al.\end{tex2html_bgroup}$
2002] and NIST [NIST2003] and METEOR [Banerjee and Lavie2005] for evaluation tools. Table 11 summarizes the automatic evaluation results of our machine translation evaluation for the Japanese-English simple sentences. This table shows the results of 1,143 sentences that were matched with the Japanese-English patterns. ``Proposed'' indicates our proposed system (PBMT+SMT), and ``Moses'' indicates a standard SMT system.

We obtained a BLEU score of 0.1821 in the Japanese-English simple sentences using our proposed system. In contrast, we obtained a BLEU score of 0.2218 in the Japanese-English simple sentences using the standard SMT system (Moses). This means that our proposed system was not effective for automatic evaluation in the Japanese-English simple sentences.

**Table:** Experimental Results (1,143 sentence)
	BLEU	NIST	METEOR
Proposed	0.1821	4.817	0.4426
Moses	0.2218	5.239	0.4363

Table 12 shows the all test sentences(10,000 sentences). The 1,143 sentences were translated with the proposed method. The rest of 8,857 sentences were translated with the standard SMT system (Moses).

**Table:** Experimental Results (10,000 sentence)
	BLEU	NIST	METEOR
Proposed	0.1101	4.4511	0.3175
Moses	0.1130	4.5131	0.3160

Next: Human evaluation Up: Results of our Machine Previous: Examples of output sentences

Jin'ichi Murakami 2012-11-06