next up previous
Next: Punctuation procedure Up: Experiments with Statistical Machine Previous: Experiments with Statistical Machine

Removed long parallel sentences

We used only the IWSLT2008 training corpus. (Chinese-English parallel sentences). So, we used 19972 Chinese-English parallel sentences for the BTEC-CE, the Challenge-CE, and the Challenge-EC task. We refer to this experiments as "primary".

On the other hand, in the BTEC-CE and the Challenge-CE task, we removed more than 48 characters Chinese sentences for training parallel data. So, we used 19327 Chinese-English parallel sentences. Also, in the Challenge-EC task, we removed more than 96 character English sentences for training parallel data. So, we used 19387 English-Chinese parallel sentences. We refer to this experiments as "contrast".



Jin'ichi Murakami 2008-10-28