Estimation of the number of words to be recorded

Among the 174,000 city and town names in Japan, we focused on the names with four to six moras. This is because the proposed method is especially suited to words of these lengths and they occupy 60% of all words. The number of components appearing in the words classified by mora length and the minimum number of words in the set that covers all the necessary components are shown in Table 1. It shows that only one sixth of the total words need be recorded.
Table 1: The number of words and components
  4 moras 5 moras 6 moras
all words 31,222 38,430 35,635
components 7,606 10,670 12,025
words to be recorded 4,560 6,066 6,262

