next up previous contents
Next: Basic statistics for classes Up: Basic computing operations Previous: Comparison of two data

Class statistics

 

Operations for computing class statistics follow the same principles as field statistics. An operation takes a classified data as a parameter. Then it collects all the data records pointed to by these classes and computes statistical values for them. The result will be a data frame, in which each class will have a data record and one statistical value over all classes is stored in a field. The principle of these operations has been described in the following figure:

figure1740

clstat Macro command for computing statistics for classes
-c <cldata> classified data for defining classes
-d <srcdata> source data frame
-dout <trgdata> target data for statistics in classes
[-all] compute all the statistics (other flags)
[-sum] compute sums
[-avg] compute averages
[-med] compute medians
[-var] compute variances
[-adev] compute average tex2html_wrap_inline8163 standard deviation
[-min] find minimum values
[-max] find maximum values
[-quar] compute the first and the third quartiles (25%, 75%)
[-hits] create a variable for the number of the items in classes
[-name] the name of the class as a string
[-id] the identifier of the class as an integer
[-md <value>] missing value to be skipped

This macro command computes statistics for classes of classified data <cldata> from <srcdata>. The procedures called by this command are documented in their own sections below.

Example (ex4.9): In the first example, Boston data is classified between unique values located in field chas, which has two possible values: the appartment is located near the river (1) or not (0). Then the average values are computed from other fields. Also the numbers of items in both classes is evaluated.

...
NDA> select key1 -f boston.chas
NDA> uniq -d key1 -cout cld1
NDA> clstat -d boston -c cld1 -dout sta -avg -hits
NDA> ls -fr sta -f
 hits
 crim_avg
 zn_avg
...

Example (ex4.10): This second example demonstrates computing statistics for SOM neurons. One advantage of using the statistical values compared to the use of weights is, that they have clear interpretations such as mean, minimum and maximum of the data records chosen to the neurons.

...
NDA> somtr -d predata -sout som1 -l 4
...
NDA> somcl -d predata -s som1 -cout cld1
NDA> clstat -d boston -c cld1 -dout sta -all




next up previous contents
Next: Basic statistics for classes Up: Basic computing operations Previous: Comparison of two data

Anssi Lensu
Thu May 17 15:00:44 EET DST 2001