next up previous
Next: Making conclusions and fitness Up: Combining categorical analyses Previous: Combining categorical analyses

Group membership calculation

Ungrouped data records tex2html_wrap_inline376 can be given an interpretation in relation to known groups tex2html_wrap_inline368 by calculating memberships tex2html_wrap_inline380 , which are functions of distance tex2html_wrap_inline382 , tex2html_wrap_inline384 . Distance to the group could be calculated using the closest member of the group (single linkage) tex2html_wrap_inline386 or the center of the group (group-average linkage) tex2html_wrap_inline388 , in which tex2html_wrap_inline390 .

   figure44
Figure 3: Different linkages for distance calculation to some group tex2html_wrap_inline368 : a) single and b) group-average. c) A problematic distribution for group-average linkage.

The group-average linkage calculation is computationally more efficient (O(n+m) compared to single linkage O(mn), where n is the number of data records and m the number of records in the group), but it is clear that some problematic data sets should not be evaluated with group-average linkage (see figure 3 c).

The actual group memberships tex2html_wrap_inline380 can be calculated as a function of distance tex2html_wrap_inline404 , tex2html_wrap_inline384 . For examples, see figure 4.

   figure53
Figure 4: a) Group membership distributions of two groups in a two dimensional case. b) Examples of group membership functions. Dashed lines indicate identified group boundary. max identifies located maximum distance.



Anssi Lensu
Tue Nov 3 11:38:53 EET 1998