next up previous
次へ: Concept of Two-Stage Machine 上へ: Statistical Pattern-Based Machine Translation 戻る: Statistical Pattern-Based Machine Translation

Introduction

Machine translation (MT) systems have been extensively studied, and there are now three generations of this technology. The first generation is a rule-based MT (RBMT) system. A pattern-based MT (PBMT) system is a kind of RBMT system. The second generation is an example-based machine translation system, and the third generation is a statistical machine translation (SMT) system, which has become very popular. Many versions of SMT systems are available. An early SMT system was based on word-based models (IBM1 $ \sim$ 5[1]). Recent statistical MT systems usually use phrase-based models.

However, some problems arise with phrase-based SMT. One problem is with the language model. Generally, an $ N$ -gram model is used as a language model. However, this model has local language information and does not have grammatical information. To solve these problems, we developed a two-stage MT system. The first stage consists of an automatic created PBMT system. The second stage consists of a standard SMT system.

In French-English translation, the first stage consists of a French-English PBMT. In this stage, we obtain "English" sentences from French sentences. Our aim to is to produce grammatically correct "English" sentences. However, these "English" sentences sometimes have low levels of naturalness, because they were obtained using an automatically created PBMT. In the second stage, we use a standard SMT system. This stage involves "English" to English machine translation. With this stage, our aim is to revise the outputs of the first stage for improving naturalness.

We developed a PBMT system for the first stage using "training-model.perl" [4]. We also developed a standard SMT system for the second stage using general SMT tools, such as "Moses" [4]. We used these data and tools to participate in the Basic Travel Expression Corpus - French English (BTEC-FE) task at International Workshop on Spoken Language Translation 2010 (IWSLT2010).

The proposed system was effective in the BTEC-FE task. We obtained a Bilingual Evaluation Understudy (BLEU) score of 0.5201 with our proposed system. In contrast, we obtained a BLEU score of 0.5077 in the BTEC-FE task using a standard SMT system (Moses). This means that our proposed system is effective for the BTEC-FE task However, our system placed 7th out of 9 systems.


next up previous
次へ: Concept of Two-Stage Machine 上へ: Statistical Pattern-Based Machine Translation 戻る: Statistical Pattern-Based Machine Translation
Jin'ichi Murakami 平成22年12月20日