next up previous
次へ: 文献目録 上へ: Vector Space Model based 戻る: Reduction of Dimension

Concluding Remarks

This paper has proposed Semantic-VSM, which uses the Semantic Attribute System defined by "A-Japanese Lexicon" as bases for specific vectors. Taking notice of the semantic relations between Semantic Attributes, generalization methods were also proposed in order to reduce the dimension without decreasing the performance of information retrieval.

In the experiments, this method was applied to the test collection of BMIR-J2, which contains 5079 newspaper articles. The results are as follows:

First, this method yields high recall performance compared to conventional Word-VSM because it is independent from fluctuations of word appearance, and documents which have synonyms are also retrieved. On the other hand, it is apt to pick up irrelevant documents and precision of performance decreases. Ultimately, the total performance ($F$ value) is almost the same as W-VSM.

Second, in this method, generalization of vector bases can be easily performed at very low cost and the dimension can be greatly reduced compared to conventional W-VSM. Accordingly, the new method can be applied to large databases.

The remaining problems are as follows:

The first is related to how to select the target of generalization. In this study, the generalization target consisted of the nodes with small granularity or light weight. However, it can be pointed out that some of the large granularity nodes are apt to pick up irrelevant documents.

The second will be the problem of polysemy. This study did not consider the influence of polysemy, but the semantic attribute system used here has the ability to remove the ambiguity of meaning of words used in actual sentences.

These problems should be considered in the next step.


next up previous
次へ: 文献目録 上へ: Vector Space Model based 戻る: Reduction of Dimension
Jin'ichi Murakami 平成13年10月5日