One problem with phrase-based statistical machine translation is with
the language model. Generally, an
-gram model is used as a language
model. However, this model has local language information and does not
have grammatical information. To include grammatical information, we
studied hierarchical phrase-based machine translation (HPMT)
[13]. However, HPMT analysis is similar to context free
grammar (CFG). We believe that such analysis complicates statistical
machine translation with too many parameters. Therefore, it is
unreliable and does not perform well, specially for the small amount
of training data. On the other hand, PBMT is well known and has been
extensively studied. Normally, PBMT is simple and has few parameters
compared to CFG-based MT, and the output of PBMT has grammatical
information. However, there is a trade-off between a coverage of input
sentences and a translation quality for the results of PBMT. If we
obtain good translation quality, the coverage of RBMT for input
sentences is low. If we obtain high coverage for input sentencens, the
translation quality is low.
To overcome these problems, we propose a two-stage MT system. We developed a PBMT system for the first stage. This PBMT system had low coverage and high quality. If a French sentence is translated using this system, the quality of output is good and the outputs have grammatical information. If a French sentence is not translated using PBMT, we use a standard SMT. Therefore, we obtain good quality from the entire system. Also, normally, PBMT is created manually. It has many labor costs. So we developed an automatically created PBMT system. This automatic PBMT output had somtimes less naturalness. So we added SMT after PBMT to improve naturalness. In this system, we use RBMT in the pre-processing stage for SMT.