Many machine translation systems have been studied for long time and there was three generations of this technology.
The first generation was a rule-based translation method, which was developed over the course of many years. This method had translation rules that were written by hand. Thus, if the input sentence completely matched the rule, the output sentence had the best quality. However, many expressions are used for natural language, this technology had very small coverage. In addition, the main problem are that the cost to write rules was too high and that maintaining the rules was hard.
The second generation was example-based machine translation method. This method finds a similar sentence from corpus and generates a similar output sentence. The problem with this method is calculating the similarity. Many methods like dynamic program (DP) are available. However, they are very heuristic and intuitive and not based on mathematics.
The third generation was a statistical machine translation method and this method is very popular now. This method is based on the statistics, and it seems very reasonable. There are many versions of statistical machine translation models available. An early model of statistical machine translation was based on IBM1 5[1]. This model is based on individual words, and thus a ``null word'' model is needed. However, this ``null word'' model sometimes has very serious problems, especially in decoding. Thus, recent statistical machine translation systems usually use phrase based models. This phrase based statistical machine translation model has translation model and language model. The phrase table is a translation model for phrase-based SMT and consists of Japanese language phrases and corresponding English language phrases and these probabilities. And word -gram model is used as a language model. By the way, there are two points to evaluate English sentences for Japanese to English machine translation. One is adequacy, and the other is fluency. We believe adequacy is related to translation model and fluency is related to language model .
However some problems arise with phrase-based statistical machine translation. One problem is as follows. Normally, a translation model requires a large parallel corpus. However, if we use a smaller parallel corpus, it results in many unknown words in the output translation. The second problem is that normally, an -gram model is used as a language model. However, this model consists of local language information and does not have grammatical information.
In Japanese-English translation, the first stage consists of Japanese-English rule-based machine translation. In this stage, we obtained "ENGLISH" sentences from Japanese sentences. We aim to achieve "ENGLISH" sentences that contain few unknown words and that are generally grammatically correct. However, these "ENGLISH" sentences have low levels of fluency and naturalness because they were obtained using rule-based machine translation. In the second stage, we used a normal statistical machine translation system. This stage involves "ENGLISH" to English machine translation. With this stage, we aim to revise the outputs of the first stage improve the naturalness and fluency.
We used IBM King504 (翻訳の王様 in Japanese) for the first stage. We used general statistical machine translation tools for the second stage, such as "Giza++"[5], "moses" [7], and "training-phrase-model.perl" [10]. And, we could not use all data for restrict of computer memory and computational costs. We used only NTCIR-7 data. It means we used only 1798571 sentences. Also, the score was not optimized, and our method was still very promising. We used these data and these tools and participated in the Intrinsic-JE, Intrinsic-EJ, and Extrinsic-JE. at NTCIR-8.
From the results of experiments, we obtained a BLEU score of 0.2565 in the JE task using our proposed method. In contrast, we obtained a BLEU score of 0.2165 in the Intrinsic-JE task using a standard method (moses). And we obtained a BLEU score of 0.2602 in the Intrinsic-EJ task using our proposed method. In contrast, we obtained a BLEU score of 0.2501 in the Intrinsic-EJ task using a standard method (moses). This means that our proposed method was effective for all task.
As the results, the proposed method was effective for all task. Even though we used only NTCIR-7 database(1798571 sentences), our system had average performance for NTCIR-7 Patent Translation task [14]. For example, our system was the 11th place in 20 system for Intrinsic-JE task and the 19th place in 22 system for Intrinsic-EJ task.
For the future study, we will try to improve the performance by using all NTCIR-8 and NTCIR-7 database and by optimizing parameters. So, we will continue to develop the method and try again in the future.