next up previous
次へ: Concluding Remarks 上へ: Experimental Results 戻る: (2) Experimental Results

Reduction of Dimension

In order to reduce the number of vector bases, generalizations by granularity and weight were conducted. The relation between the number of bases and the performance of information retrieval is shown in Fig.5.  The value of evaluation function $H$ is also shown in the same figure.



From this figure, the minimum set of vector bases at which the performance of information retrieval does not decrease more than 10% or 20% from the maximum value was obtained as shown in Table 1.

From these figures and the table, the following observations can be made:

(1)
S-VSM is robust in reducing the number of vector bases compared to W-VSM.
(2)
In particular, generalization by weight is more robust than generalization by granularity.

図 5: Determination of Minimum Number of Vector Bases
\begin{figure}\begin{center}
\epsfile{file=gra3.eps,height=8.5cm}
\vspace{-5mm}
\end{center}\end{figure}



表 1: Minimum Number of Vector Bases
\begin{table}\begin{center}
\epsfile{file=hyo1.eps,height=3cm}
\end{center}\end{table}


On condition that the performance of information retrieval does not decrease more than 10% to 20% from the maximum value, conventional W-VSM requires 2,000 dimensions. In comparison with this, the number of dimensions can be reduced to 300-600 in S-VSM.


next up previous
次へ: Concluding Remarks 上へ: Experimental Results 戻る: (2) Experimental Results
Jin'ichi Murakami 平成13年10月5日