次へ: Related Work 上へ: Statistical Pattern-Based Machine Translation 戻る: Introduction

Concept of Two-Stage Machine Translation

One problem with phrase-based statistical machine translation is with the language model. Generally, an -gram model is used as a language model. However, this model has local language information and does not have grammatical information. To include grammatical information, we studied hierarchical phrase-based machine translation (HPMT) [13]. However, HPMT analysis is similar to context free grammar (CFG). We believe that such analysis complicates statistical machine translation with too many parameters. Therefore, it is unreliable and does not perform well, specially for the small amount of training data. On the other hand, PBMT is well known and has been extensively studied. Normally, PBMT is simple and has few parameters compared to CFG-based MT, and the output of PBMT has grammatical information. However, there is a trade-off between a coverage of input sentences and a translation quality for the results of PBMT. If we obtain good translation quality, the coverage of RBMT for input sentences is low. If we obtain high coverage for input sentencens, the translation quality is low.

To overcome these problems, we propose a two-stage MT system. We developed a PBMT system for the first stage. This PBMT system had low coverage and high quality. If a French sentence is translated using this system, the quality of output is good and the outputs have grammatical information. If a French sentence is not translated using PBMT, we use a standard SMT. Therefore, we obtain good quality from the entire system. Also, normally, PBMT is created manually. It has many labor costs. So we developed an automatically created PBMT system. This automatic PBMT output had somtimes less naturalness. So we added SMT after PBMT to improve naturalness. In this system, we use RBMT in the pre-processing stage for SMT.

次へ: Related Work 上へ: Statistical Pattern-Based Machine Translation 戻る: Introduction

Jin'ichi Murakami 平成22年12月20日