next up previous
Next: Experimental Results Up: Pattern-Based Statistical Machine Translation Previous: Notes

Experimental conditions

  1. Database

    We used NTCIR-10 Japanese-English patent sentence pairs. We used 3,000,000 Japanese-English patent sentence pairs for the training. We used Mecab [8] as the morphological analyzer and the standing tokenizer of Moses [6].

  2. Threshold

    1. Making Japanese-English patterns

      We used a word dictionary with $ \alpha$ = 0.1 to make the Japanese-English translation patterns. ($ \alpha$ is used in Step 2 of Section 4.1.) As a result, we obtained 31,843 Japanese-English word pairs (word dictionary) and 3,158,406 Japanese-English translation patterns from the training.

    2. Generating Japanese translation sentences

      We used a word dictionary with $ \alpha$ = 0.01 to generate the Japanese translation sentences. As a result, we obtained 125,194 Japanese-English word pairs (word dictionary).

  3. Tri-gram Data

    We used about 3,000,000 Japanese-English sentence pairs to calculate the English word tri-gram for the language model.

  4. Phrase-Based Statistical Machine Translation (Moses)

    We used Moses [6] as the phrase-based SMT for comparison.

  5. Rule-Based Machine Translation System

    For comparison, we used the art of trial rule-based machine translation system as a rule-base machine translation.


next up previous
Next: Experimental Results Up: Pattern-Based Statistical Machine Translation Previous: Notes
Jin'ichi Murakami 2013-06-26