next up previous
Next: Example data sets Up: Analysis of Gallup Questionnaires Previous: Group membership calculation

Making conclusions and fitness estimation

In the first phase of the analysis, groupings from distinct answer categories were located and named with symbols. In the second phase original data vectors were replaced with a vector of truth values indicating belongings to the groups of the first phase, and then organized with the SOM. The concluding SOM is then analyzed by computing average truth values of categorical groups for the neurons, and if some predefined threshold is exceeded, the group existence in that neuron is identified.

However, because of the group membership calculation, initial subgroups and final groups might not be exactly the same. In figure 5 some data records belong to both the 1st and the 2nd group. If we set the group membership threshold for group 'Just 1st' too high (0.9), some data records are left outside of the final group. If the threshold is too low (0.6), some additional data records, including also representatives of 'Both 1st and 2nd', get included. To evaluate the differences, the record indices of original data records in categorical analysis groupings can be obtained and verified against the data records in concluding groupings of the final picture.

   figure66
Figure 5: An example of misclassifications.

To formulate these verifications as propabilities, all the data records indicating certain categorical behavior (group g) can be collected to sets tex2html_wrap_inline418 and these can be compared to the original categorical sets tex2html_wrap_inline368 . Good estimates can be obtained with

displaymath414

where tex2html_wrap_inline422 denotes the number of items in set A.



Anssi Lensu
Tue Nov 3 11:38:53 EET 1998