next up previous
Next: Data Preprosessing Requirements Up: The Analysis Method Previous: The Analysis Method

The Self-Organizing Map

There are many methods that can be used to reduce the dimensionality of the data. The Self-Organizing Map works both as a projection method and a clustering algorithm. High-dimensional data is mapped to a low-dimensional space, typically to one or two dimensions. In clusterization similar data vectors are mapped to near-by neurons and therefore the topology of the original data is preserved. These properties are very useful in data analysis.

The number of neurons in a SOM needs to be large enough so that there are a few neurons to represent each larger group of the data. These small sets of neurons provide a subsymbolic representation to the data groups. In the training process these automatically forming small sets of neurons become the prototypes, which are ordered according to their similarity. Figure 3 depicts how the SOM reduces data dimensionality from two to one compared to the use of a principal component.

   figure33
Figure: Dimensionality reduction in a two dimensional case. If we just use the first principal component, no clear decision boundaries can be found, and the reduction of dimensionality does not improve the analysis. The Self-Organizing Map is able to provide a "curve" to which the data points can be projected appropriately and grouping is easy. In this sense the SOM behaves in a similar way as the principal curves [6, 7].

After the neurons are trained, each data vector can be classified to one of the neurons. Once the original data is grouped in this manner, statistical properties of the training (and also background) variables can be calculated and visualized.



Anssi Lensu
Tue Nov 3 12:42:42 EET 1998