Next: Experimental Results Up: Pattern-Based Statistical Machine Translation Previous: Notes

Experimental conditions

Database
We used NTCIR-10 Japanese-English patent sentence pairs. We used 3,000,000 Japanese-English patent sentence pairs for the training. We used Mecab [8] as the morphological analyzer and the standing tokenizer of Moses [6].
Threshold
1. Making Japanese-English patterns
  We used a word dictionary with $\alpha$ = 0.1 to make the Japanese-English translation patterns. ( $\alpha$ is used in Step 2 of Section 4.1.) As a result, we obtained 31,843 Japanese-English word pairs (word dictionary) and 3,158,406 Japanese-English translation patterns from the training.
2. Generating Japanese translation sentences
  We used a word dictionary with $\alpha$ = 0.01 to generate the Japanese translation sentences. As a result, we obtained 125,194 Japanese-English word pairs (word dictionary).
Tri-gram Data
We used about 3,000,000 Japanese-English sentence pairs to calculate the English word tri-gram for the language model.
Phrase-Based Statistical Machine Translation (Moses)
We used Moses [6] as the phrase-based SMT for comparison.
Rule-Based Machine Translation System
For comparison, we used the art of trial rule-based machine translation system as a rule-base machine translation.

Next: Experimental Results Up: Pattern-Based Statistical Machine Translation Previous: Notes

Jin'ichi Murakami 2013-06-26