Set operations for classes

Next: Basic computing operations Up: Data reorganization Previous: Overlapping sequential data sets

Set operations for classes

These set operations can be used to compare single classes or complete classifications having the same class names.

tabular950

These three commands perform normal set operations intersection, union or difference to two classified data classes. Intersection contains those data record indices included in both classes, union contains a superset of all indices within the two classes and difference those indices within just one of the classes, but not in both of them.

Example: For a thorough example, see the example of classification comparisons. Here is just a basic example with a small data set.

...
#
# Data file with 8 numbers (1, 2, 3, 5, 8, 6, 7 and 4)
#
NDA> getdata koje
x
1
1
2
3
5
8
6
7
4
#
# Select two classes containing indices to values larger than or equal to
# 3 (first) and values between 5 and 7 (second)
#
NDA> selcl -cout cli -clout first -expr 'koje.x' >= 3;
NDA> selcl -cout cli -clout second -expr 'koje.x' >= 5 and 'koje.x' <= 7;
#
# Evaluate difference of classes
#
NDA> cldiff -cl1 cli.first -cl2 cli.second -clout cli.result
NDA> save cli
...
#
# This results in cli.cld containing following data lines
#
...
class_info
201 1 first 6 2 3 4 5 6 7 
201 1 second 3 3 5 6 
201 1 res 3 7 4 2 
#
# Class first: 6 indices (234567), class second 3 indices (356),
# Difference:  3 indices (247)
#

tabular960

These commands perform class by class set operations for entire classifications. Each class name of classified data 1 is compared to the names of classes in classified data 2. If a match is found, a new class with the same name is created into the target classified data containing the intersection, union or difference.

Example: To compare the classification results of two different layers of TS-SOM, union and intersection can be used.

...
#
# Data loaded into d - Selection of fields, training of a TS-SOM
#
NDA> select datas -f d.GPos d.GNeg d.TPos d.TNeg
NDA> somtr -d datas -sout som -cout cldata -l 4
train layer: 0
train layer: 1
train layer: 2
train layer: 3
#
# Average value calculation for each neuron
#
NDA> clstat -c cldata -d datas -dout st -avg
NDA> ls -fr st
st.GPos_avg
st.GNeg_avg
st.TPos_avg
st.TNeg_avg
#
# TS-SOM layer information is needed for selection of data records
#
NDA> somlayer -s som -fout soml
NDA> selcl -cout grp2 -clout GP -expr 'soml' = 2 and 'st.GPos_avg' >= 0.7;
NDA> selcl -cout grp2 -clout GN -expr 'soml' = 2 and 'st.GNeg_avg' >= 0.7;
NDA> selcl -cout grp2 -clout TP -expr 'soml' = 2 and 'st.TPos_avg' >= 0.7;
NDA> selcl -cout grp2 -clout TN -expr 'soml' = 2 and 'st.TNeg_avg' >= 0.7;
NDA> selcl -cout grp3 -clout GP -expr 'soml' = 3 and 'st.GPos_avg' >= 0.7;
NDA> selcl -cout grp3 -clout GN -expr 'soml' = 3 and 'st.GNeg_avg' >= 0.7;
NDA> selcl -cout grp3 -clout TP -expr 'soml' = 3 and 'st.TPos_avg' >= 0.7;
NDA> selcl -cout grp3 -clout TN -expr 'soml' = 3 and 'st.TNeg_avg' >= 0.7;
#
# Selected groups of records need to be converted to classified datas
#
NDA> mergecld -c1 grp2 -c2 cldata -cout clo2
NDA> mergecld -c1 grp3 -c2 cldata -cout clo3
#
# Perform intersection and union of indices in classifications
#
NDA> cldinsec -c1 clo2 -c2 clo3 -cout clo_insec
NDA> cldunion -c1 clo2 -c2 clo3 -cout clo_union
#
# Calculate the index counts of each class and select them to same class
#
NDA> clhits -c clo_insec -fout ins
NDA> clhits -c clo_union -fout uni
NDA> select clprop -f ins uni
NDA> ls -fr clprop
clprop.ins
clprop.uni
#
# Calculate the ratio of the set sizes (intersection / union)
# for each class (GP, GN, TP and TN)
#
NDA> expr -dout props -fout prop -expr 'clprop.ins' / 'clprop.uni';
NDA> getdata props
prop
2
0.870521
0.940236
0.810526
0.972215
...

Erkki Hakkinen
Thu Sep 24 11:51:34 EET DST 1998