Consequently, the semantic attributes that increase the weight imbalance of the elements of vector should not be generalized even if they have small weight. Taking these conditions into consideration, we show how to select the attributes to be generalized.
Now, let us define the specific vector for all of the documents in the database as follows:
(4) |
Here, represents the total frequency of words in the database, the meaning of which is . And is the number of attributes used by a specific vector.
Let us introduce the evaluation function to assess the weight balance of bases by their "variation".
(5) |
Here, represent the mean value of .
(6) |
According to the above discussion, generalization should be performed by selecting the semantic attributes which decrease the value of .
Now, let us consider the case in which a semantic attribute is generalized into the upper node . is added to and m decreases by 1. Let the evaluation function be after the generalization. The change of the evaluation function is given as follows:
(7) |
Letting as a condition, we obtain the following relation:
(8) |
Thus the generalization procedure is as follows: