next up previous
Next: Standard Tools Up: Concepts of our Statistical Previous: 5-gram Language Model (Fluency)

Removed long parallel sentences

We used only the NTCIR-7 Patent Translation Task training corpus. This training corpus included some very long parallel sentences. And we found that long parallel sentences make wrong phrase table caused. Therefore, we removed these long parallel sentences from training data.



Jin'ichi Murakami 2008-12-22