In the first phase of the analysis, groupings from distinct answer categories were located and named with symbols. In the second phase original data vectors were replaced with a vector of truth values indicating belongings to the groups of the first phase, and then organized with the SOM. The concluding SOM is then analyzed by computing average truth values of categorical groups for the neurons, and if some predefined threshold is exceeded, the group existence in that neuron is identified.
However, because of the group membership calculation, initial subgroups and final groups might not be exactly the same. In figure 5 some data records belong to both the 1st and the 2nd group. If we set the group membership threshold for group 'Just 1st' too high (0.9), some data records are left outside of the final group. If the threshold is too low (0.6), some additional data records, including also representatives of 'Both 1st and 2nd', get included. To evaluate the differences, the record indices of original data records in categorical analysis groupings can be obtained and verified against the data records in concluding groupings of the final picture.
Figure 5: An example of misclassifications.
To formulate these verifications as propabilities, all the data records indicating certain categorical behavior (group g) can be collected to sets and these can be compared to the original categorical sets . Good estimates can be obtained with
where denotes the number of items in set A.