next up previous
Next: Decoder Up: Experiments with Statistical Machine Previous: Phrase Tables

5-gram language model

We calculated the 5-gram model using ngram-count in the Stanford Research Institute Language Model (SRILM) toolkit[7], and set the smoothing parameter as " -ukndiscount". It means original Kneser-Ney discounting. This model is the best language model among 2#2-gram from our previous results at the NTCIR-7 Patent Traslation Task dry-run task.

With the 1798581 parallel sentences, we obtained the followings.

In Japanese to English translation, we had 214265 lines for 6#6-gram, we had 3249108 lines for 7#7-gram, we had 4139515 lines for 8#8-gram, we had 5697384 lines for 9#9-gram, we had 5872543 lines for 10#10-gram.

In English-Japanese translation, we had 91772 lines for 6#6-gram, we had 1754357 lines for 7#7-gram, we had 3752249 lines for 8#8-gram, we had 6262883 lines for 9#9-gram, we had 7684568 lines for 10#10-gram.



Jin'ichi Murakami 2008-12-22