next up previous
Next: Introduction

Statistical Machine Translation with Long Phrase Table and without Long Parallel Sentences

Jin'ichi Murakami, Masato Tokuhisa, Satoru Ikehara
Department of Information and Knowledge Engineering
Faculty of Engineering
Tottori University, Japan
4-101 koyamachou Tottori City Tottori 680-8552, Japan
murakami@ike.tottori-u.ac.jp

Abstract:

In this study, we paid attention to the reliability of phrase table. To make phrase table, We have been used Och's method[3]. And this method sometimes generate completely wrong phrase table. We found that such phrase table caused by long parallel sentences. Therefore, we removed these long parallel sentences from training data. Also, we utilized general tools for statistical machine translation, such as "Giza++"[4], "moses"[5], and "training-phrase-model.perl"[6].

We obtained a BLEU score of 0.2229 of the Intrinsic-JE task and 0.2393 of the Intrinsic-EJ task for our proposed method. On the other hand, we obtained a BLEU score of 0.2162 of the Intrinsic-JE task and 0.2533 of the Intrinsic-EJ task for a standard method.

This means that our proposed method was effective for the Intrinsic-JE task. However, it was not effective for the Intrinsic-EJ tasks. Also, our system was average performance of all system. For example, our system was the 20th place in 34 system for Intrinsic-JE task and the 12th place in 20 system for Intrinsic-EJ task.

Keywords: "SMT" "Long Phrase Table" "Remove Long Parallel Sentences"




next up previous
Next: Introduction
Jin'ichi Murakami 2008-12-22