Next: Related Work Up: Two stage Machine Translation Previous: Introduction

Concept of Two-Stage Machine Translation

One problem with phrase-based statistical machine translation is the language model. Generally, an -gram model is used as the language model. However, this model includes only local language information and does not include grammatical information. We studied hierarchical phrase-based statistical machine translation (HSMT) [Li
$\bgroupet al.\end{tex2html_bgroup}$
2009] as a way to include grammatical information. However, HSMT analysis is similar to that of context-free grammars (CFG). We believe that such analysis complicates statistical machine translation by adding too many parameters. Therefore, it is unreliable and does not perform well, especially for the small amount of training data. On the contrary, PBMT is well known and has been extensively studied. Normally, PBMT is simple and has few parameters compared to CFG-based MT, and the output of PBMT contains grammatical information. However, there is a trade-off between the coverage of input sentences and the translation quality in the PBMT results. If we obtain good translation quality, then the coverage of RBMT for input sentences is low in the translation. If we obtain high coverage for input sentences, the translation quality is low.

We propose a two-stage MT system to overcome these problems. We developed a PBMT system for the first stage. This PBMT system had low coverage and high quality. When Japanese sentences were translated using this system, the quality of the output was good, and the outputs contained grammatical information. When not using the PBMT system to translate Japanese sentences, we used a standard SMT system. Therefore, we can obtain good quality from the entire system. Also, PBMT systems are usually created manually, which results in a huge labor cost. Therefore, we developed an automatically created PBMT system. However, this automatic PBMT output sometimes had less fluency, so we added SMT after PBMT to improve the fluency. In this system, we used PBMT in the pre-processing stage of SMT.

Next: Related Work Up: Two stage Machine Translation Previous: Introduction

Jin'ichi Murakami 2012-11-06