next up previous
次へ: Introduction

Vector Space Model based on Semantic Attributes of Words

Satoru Ikehara *, Jin'ichi Murakami * , Yasuhiro Kimoto * , Teturou Araki **

* ikehara, murakami,
Faculty of Engineering, Tottori University Minami 4-101, Tottori-city, 680-8552 Japan
** araki
Department of Human and Artificial Intelligent Systems Fukui University Bunkyou 3-9-1, Fukui, Fukui 910-8507, Japan


In order to reduce the dimension of VSM (Vector Space Model) for information retrieval and clustering, this paper proposes a new method, Semantic-VSM, which uses the Semantic Attribute System defined by "A-Japanese-Lexicon" instead of literal words used in conventional VSM.

The attribute system consists of a tree structure with 2,710 attributes, which includes 400 thousand literal words. Using this attribute system, the generalization of vector elements can be performed easily based on upper-lower relationships of semantic attributes, so that the dimension can easily be reduced at very low cost. Synonyms are automatically assessed through semantic attributes to improve the recall performance of retrieval systems.

Experimental results applying it to BMIR-J2 database of 5,079 newspaper articles showed that the dimension can be reduced from 2,710 to 300 or 600 with only a small degradation in performance. High recall performance was also shown compared with conventional VSM.

論文をps形式でダウンロードする (約1Mbyte)

next up previous
次へ: Introduction
Jin'ichi Murakami 平成13年10月5日