next up previous
Next: Fuzzification and Other Data Up: The Analysis Method Previous: The Self-Organizing Map

Data Preprosessing Requirements

Due to the Best Matching Unit (BMU) search used in Self-Organizing Maps the ranges of all the variables should be roughly equal. If one of the original variables dominates, this usually results in a network showing only the alternation of that particular variable. Therefore, the variable values should be scaled to a common range. Binary data sometimes gives the best results with Self-Organizing Maps, but other normalized data vectors are well-suited for training, too.

All of the steps of SOM analysis: normalization, search of the BMU, change of the weights, calculation of statistics, etc. can be performed without certain variables, for example the missing data fields. This changes the learning process a bit, but in most cases should not affect the results too much. However, the omission of fields might result in a few neurons, in which some of the weights could contain an arbitrary value. That value would be the one used in the initialization of the neural net. This could happen, if all of the data vectors indicating similar behavior do not contain a certain variable, as shown in figure 4.

   figure40
Figure 4: The consequences of missing data. Each rectangle represents one neuron.

In this case the neuron marked with dark gray would get a sensible value to all the other fields, but the one marked with black. The neurons marked with light gray receive information from their neighboring neurons as part of the learning process. Therefore the missing field values do not result in an arbitrary weight value in these neurons. This approach to the problem of missing field values would need a more thorough study.



Anssi Lensu
Tue Nov 3 12:42:42 EET 1998