next up previous contents
Next: Algorithms Up: Basic computing operations Previous: Calculating distances between data

Group membership evaluation

 

This command is used for the evaluation of group memberships. Some type of dissimilarity measures, in which a value of 0 means that some data record is quite similar to some other, for example group representative, and a large value means that the data record is really different, can be converted into group memberships between 0.0 (really far from the group) and 1.0 (belongs to the group).

To evaluate these memberships, the maximum dissimilarity (or distance) for each original field is located (if it has not been specified on the command line). Normal distance calculation results in a global decrease of the group membership within the data distribution. However, the local mode of distance calculation results in a data, in which group membership decreases to 0.0 before closest group boundary or average is reached. All dissimilarity values larger than a specified one result in a membership of 0.0.

There are several membership functions that can be used. Default is linear, but negative exponential functions ( tex2html_wrap_inline8391 , tex2html_wrap_inline8393 , tex2html_wrap_inline8395 , ...) and logistic sigmoid ( tex2html_wrap_inline8397 ) can also be used.

figure2013

The following figure illustrates a two-dimensional case with three groups. The distances tex2html_wrap_inline8399 have been calculated with the single linkage local model and group averages have been evaluated with a second-order negative exponential function.

figure2019

grpms Evaluate group memberships according to dissimilarity
-d <datain> dissimilarity measures for each group
-dout <dataout> calculated group memberships
[-dmax <maxfld>] name of field containing specified maximum dissimilarity
[-npow <exp>] decrease according to a exp-order negative exponential function (default linear)
[-logsig] decrease according to a logistic sigmoid
[-scale] divide all memberships by located maximum cumulative membership
[-scale1] scale all cumulative memberships to 1.0
[-limit1] limit cumulative memberships to 1.0
[-null] set other group memberships to 0.0, if one is 1.0. Works only with -limit1

Command calculates group memberships according to dissimilarity (or distance) between a data vector and selected groups. Group memberships are scaled between 1.0 and 0.0 and decrease according to linear, negative exponential or logistic sigmoid function. For each group separately, maximum dissimilarity is located, if it has not been specified. Cumulative group memberships for each data record can be limited to 1.0, scaled to result a sum of 1.0 or scaled by dividing with located maximum sum.

Example: Following commands create a frame containing group memberships of a rule-based grouping. These memberships are based on euclidean distances between data vectors and the selected group centroids.

# Load a two-dimensional test data containing random points
#  within the square (0,0), (0,1), (1,0) and (1,1)
NDA> load testi.dat -n t
- field <x> (len 1)
- field <y> (len 1)
# Select groups: xL - x large & y small, yL - y large & x small
NDA> selcl -cout cldO -clout xL -expr 't.x'>=0.7 and 't.y'<=0.3;
NDA> selcl -cout cldO -clout yL -expr 't.y'>=0.7 and 't.x'<=0.3;
# Calc. distances from data records to group averages, local mode
NDA> cldist -c cldO -d t -dout Dist -gavg -local
# Evaluate group memberships according to the distances
NDA> grpms -d Dist -dout GrpMs -logsig
NDA> ls -fr GrpMs
 GrpMs.xL
 GrpMs.yL

If the group memberships of the example, xL and yL, are used as the z-axis of the original two-dimensional data and points are drawn into a window, the result is similar to the following figure. xL is denoted with red color.

figure2040


next up previous contents
Next: Algorithms Up: Basic computing operations Previous: Calculating distances between data

Anssi Lensu
Thu May 17 15:00:44 EET DST 2001