next up previous
次へ: (2) Semantic Similarity of 上へ: Semantic-Vector Space Model 戻る: Conventional Model

(1) Meanings of Sentence and Document

Conventional VSM assumes that meanings of a sentence and a document are represented by a set of words used in them. The set of words is represented by specific vector $V$ as follows:



\begin{displaymath}
V=(w_1,w_2,\cdot \cdot \cdot w_i, \cdot \cdot \cdot w_n)
\end{displaymath} (1)

Here, $i \> (1 \leq i \leq n)$ is the number of words which are used to represent the meanings of sentences.

As for words to be used for vector elements, similarly to the information retrieval system which uses controlled Key Words, important words are statistically selected by some conventional method, such as "$tf \cdot idf$" from all of the documents in the database. The values of weight $w_i$ are usually determined dependent on the frequency of the appearance of word $\char93 i$.

Here, we call the specific vector given by

(1) as "Word-Vector" and the VSM which uses this type of the specific vector as "W-VSM" (Word-Vector Space Model).



Jin'ichi Murakami 平成13年10月5日