next up previous
次へ: (2) Possibility of Improving 上へ: Semantic-Vector Space Model 戻る: Characteristics of S-VSM

(1) Possibility of Reducing Dimension

VSM assumes that the bases of specific vectors are independent from each other. However, this assumption does not hold in actual documents. There are many axes which are mutually dependent or not used. Therefore, it is very important to find the latent semantic structure.

LSI is known as a useful method to find such structures and to reduce the vector dimension. However, Singular Value Decomposition poses a problem, in that it uses too much computing time to apply it to a large number of documents.

Compared to this method, in our method, generalization of vector elements can be performed easily using the semantic relations between Semantic Attributes to reduce the dimension.

In the generalization (see below) of vector space, the attributes which are not so important are deleted, but their value is added to an upper node in the tree. Thus, the deleted attributes also contribute to a certain extent to the performance of information retrieval, and it is expected that the dimension can be reduced greatly without decreasing the performance of information retrieval.


next up previous
次へ: (2) Possibility of Improving 上へ: Semantic-Vector Space Model 戻る: Characteristics of S-VSM
Jin'ichi Murakami 平成13年10月5日