next up previous contents
Next: Algorithms Up: Basic computing operations Previous: Calculating distance between data

Group membership evaluation

This command is used for the evaluation of group memberships. Some type of dissimilarity measures, in which a value of 0 means that some data record is quite similar to some other, for example group representative, and a large value means that the data record is really different, can be converted into group memberships between 0.0 (really far from the group) and 1.0 (belongs to the group).

To evaluate these memberships, the maximum dissimilarity (or distance) for each original field is located. Normal distance calculation results in a global decrease of the group membership within the data distribution. However, the local mode of diatance calculation results in a data, in which group membership decreases to 0.0 before closest group boundary or average is reached.

There are several membership functions that can be used. Default is linear, but negative exponential functions ( tex2html_wrap_inline5576 , tex2html_wrap_inline5578 , tex2html_wrap_inline5580 , ...) and logistic sigmoid ( tex2html_wrap_inline5582 ) can also be used.

figure1499

The following figure illustrates a two-dimensional case with three groups. The distances tex2html_wrap_inline5584 have been calculated with the single linkage local model and group averages have been evaluated with a second-order negative exponential function.

figure1505

tabular1509

Command calculates group memberships according to dissimilarity (or distance) between a data vector and selected groups. Group memberships are scaled between 1.0 and 0.0 and decrease according to linear, negative exponential or logistic sigmoid function. Cumulative group memberships for each data record can be limited to 1.0, scaled to result a sum of 1.0 or scaled by dividing with located maximum sum.

Example: Following commands create a frame containing group memberships of a rule-based grouping. These memberships are based on euclidean distances between data vectors and the selected group centroids.

#
# Load a two-dimensional test data containing random points
#  within the square (0,0), (0,1), (1,0) and (1,1)
#
NDA> load testi.dat -n t
- field <x> (len 1)
- field <y> (len 1)
#
# Select two groups of data:
#  xL - x large and y small
#  yL - y large and x small
#
NDA> selcl -cout cldO -clout xL -expr 't.x' >= 0.7 and 't.y' <= 0.3;
NDA> selcl -cout cldO -clout yL -expr 't.y' >= 0.7 and 't.x' <= 0.3;
#
# Calculate distances from all data records to both group averages and
# limit the distances to the distance between the group averages
#
NDA> cldist -c cldO -d t -dout Dist -gavg -local
#
# Evaluate group memberships according to the distances
#
NDA> grpms -d Dist -dout GrpMs -logsig
NDA> ls -fr GrpMs
GrpMs.xL
GrpMs.yL

If the group memberships of the example, xL and yL, are used as the z-axis of the original two-dimensional data and points are drawn into a window, the result is similar to the following figure. xL is denoted with red color.

figure1518


next up previous contents
Next: Algorithms Up: Basic computing operations Previous: Calculating distance between data

Erkki Hakkinen
Thu Sep 24 11:51:34 EET DST 1998