next up previous
Next: Tokenizer Up: Experiments with Statistical Machine Previous: Experiments with Statistical Machine

Removed long parallel sentences

When we made phrase table for the NTCIR-7 Patent Translation Task training corpus, Some lists of phrase table are completely wrong. Table 1 presents such wrong phrase table. And wrong phrase table makes poor translation results especially for the adequacy.

Also, we found that long parallel sentences for training parallel data were easily result into such wrong phrase table. So we removed these long parallel sentences from training data.


Table 1: Examples of Wrong Phrase Table
  図 3 及び 図 4 に 示す よう に |||As shown in FIGS . 6 and 7  
  0.047619 4.03037e-09 0.0243902 1.52103e-11 2.718  
  、 それ 以外 |||4 , while all other  
  1 3.84583e-05 0.0217391 4.24439e-09  
  1 3 は |||1 is a  
  0.0010582 0.000901274 0.000698568 0.00721987  
  1 3 は |||1  
  4.97396e-06 7.91046e-05 0.000349284 0.0497444  
  コンデンサ |||i ) a  
  0.333333 0.0001482 9.04732e-05 1.01046e-06  
  コンデンサ |||i )  
  0.000857633 0.0001482 9.04732e-05 2.10078e-06  
  マイクロコンピュータ |||merely  
  0.00263852 0.000423 0.000176398 0.0001555  

We used only the NTCIR-7 Patent Translation Task training corpus, (Japanese-English parallel sentences). So, we used 1798581 Japanese-English parallel sentences for the Intrinsic-JE task and Intrinsic-EJ task. We refer to this experiments as "standard".

On the other hand, in the Intrinsic-JE task, we removed more than 64 characters Japanese sentences for training parallel data. So, we used 614298 Japanese-English parallel sentences. Also, in the Intrinsic-EJ task, we removed more than 128 character English sentences for training parallel data. So, we used 1062596 English-Japanese parallel sentences. We refer to this experiments as "proposed".

Examples of long parallel sentences are presented in table2.


Table 2: Example of Long Parallel Sentences
  J    1    図30図に示す実施例は、路面のセンターに1本のガイド5を敷設し、そのガイド5を車体15に取り付けたガイドローラ3により挟み込んで支持するように構成したものであり、前方又は後方から観た状態を示している。  
  J    2    基本的には、図31(a)の平面図に示すようにガイド5を両側から挟み込んで支持する対のガイドローラ3を2組設けることによって、安定性を確保するものであり、図31(b)はその正面図を示している。  
  J    3    また、車軸9は、ダブルウィッシュボーン式のリンク14により車体15に支持されていると共に、コイルスプリング16及びダンパー(不図示)により衝撃が吸収緩和されるようになっている。  
  J    4    3輪車の場合は、その構造から明らかなようにセンターに車輪があるため、図30に示す実施例は適用できず、車両の両外側に2本のガイド5を敷設した図34が最も適用しやすい構成である。  
  E    1    In the embodiment shown in FIG. 30, one guide 5 is laid at the center of the road surface, and the chassis 15 is supported by clamping the guide 5 by the guide rollers 3 attached to the chassis 15, the view being taken from the front or rear of the vehicle.  
  E    2    Basically, as shown in the plan view of FIG. 31(a), stability is secured by providing two sets of guide rollers 3 for clamping the guide 5 from both sides thereof to support the chassis 15, and FIG. 31(b) shows a front elevational view thereof.  
  E    3    The axle 9 is supported on a chassis 15 by means of a double wish-bone type link 14, and shocks are absorbed and alleviated by a coil spring 16 and a damper (not shown).  
  E    4    In the case of the three-wheeled vehicle, since one wheel is present at the center as is apparent from its structure, the embodiment shown in FIG. 30 is not applicable, and FIG. 34 in which two guides 5 are laid on the opposite outer sides of the vehicle is easiest to apply.  
         


next up previous
Next: Tokenizer Up: Experiments with Statistical Machine Previous: Experiments with Statistical Machine
Jin'ichi Murakami 2008-12-22