next up previous contents
Next: Field evaluator Up: Preprocessing data Previous: Equalization values within records

Treatment for missing data

  

tabular1084

The operation replaces all the missing values from the source data by using the values found from the nearest data record. The nearest data record is determined by using the normalized distance that is computed according the following formula:

displaymath5312

displaymath5313

where tex2html_wrap_inline5324 is the number of such components for which values are missing in r(a) or in r(b), and N is the number of data fields. The computation of the distance notices the missing values, and it scales the distances such that the pairs of the data records including more missing values get larger distances.

The operation msdbycen replaces missing values by the values of the prototypes given in a separated data frame. The classified data must tell which data records are connected to which prototypes. See also the command somcl 5.1.2 with the missing data flag.



Erkki Hakkinen
Thu Sep 24 11:51:34 EET DST 1998