We found many errors and mistakes with these experiments. So we tried these experiments again with same conditions. Also, we use reordering models and optimize these parameters using MERT. Table5 shows the results of these experiments. As can be seen this table, proposed method was so effective. And results of automatic evaluation score were very high. For example, the BLEU score of proposed method was 0.3598 in JE task and 0.3911 for EJ task. These values are the best score in NTCIR-9.
Task | Parameter | BLEU[8] | NIST[8] | METEOR[9] | TER [10] | WER [10] | RIBES [12] | IMPACT [13] | |
Tuning | |||||||||
Proposed | JE | ○ | 0.3598 | 8.1769 | 0.6676 | 0.5387 | 0.6436 | 0.7412 | 0.5654 |
(RBMT+SMT) | |||||||||
Proposed | JE | × | 0.2697 | 7.1982 | 0.6049 | 0.5666 | 0.6566 | 0.7240 | 0.5197 |
(RBMT+SMT) | |||||||||
Rule based MT | JE | × | 0.2761 | 6.8759 | 0.6099 | 0.6172 | 0.7048 | 0.7114 | 0.5064 |
(A state-of-the-art) | |||||||||
Baseline | JE | ○ | 0.2886 | 7.1503 | 0.6567 | 0.6684 | 0.8307 | 0.6334 | 0.4527 |
(SMT:moses) | |||||||||
Baseline | JE | × | 0.2120 | 6.9635 | 0.5741 | 0.6431 | 07852 | 0.6727 | 0.4078 |
(SMT:moses) | |||||||||
Proposed | EJ | ○ | 0.3911 | 8.3941 | 0.4991 | 0.6184 | 0.6709 | 0.5753 | |
(RBMT+SMT) | |||||||||
Proposed | EJ | × | 0.3076 | 7.6219 | 0.5441 | 0.6492 | 0.6562 | 0.5326 | |
(RBMT+SMT) | |||||||||
Rule based MT | EJ | × | 0.1998 | 5.4690 | 0.7274 | 0.8075 | 0.5632 | 0.4393 | |
(A state-of-the-art) | |||||||||
Baseline | EJ | ○ | 0.2408 | 6.4319 | 0.5441 | 0.6492 | 0.6563 | 0.4743 | |
(SMT:moses) | |||||||||
Baseline | EJ | × | 0.2531 | 7.1181 | 0.5968 | 0.7377 | 0.5532 | 0.4394 | |
(SMT:moses) |