next up previous contents
Next: Classifying data by a Up: The Tree-Structured Self-Organizing Map Previous: The Tree-Structured Self-Organizing Map

Training a TS-SOM

  

somtr Train a SOM
-d <data> name of the training data frame
-sout <som> name of TS-SOM structure to be created
[-cout <cldata>] SOM classification to be created
[-top] create the root class
[-w <wmat>] specify a different name for the weight matrix (default <som>_W)
[-l <layers>] the number of layers (default 3)
[-md <missing-data>] a code value to indicate missing data
[-D <dimension>] dimension of the TS-SOM (default 2)
[-t <type>] type of topology (default 0)
[-wght <weighting>] weighting of neighbors (default 0.5)
[-c <stop-crit>] stopping criteria (default 0.001)
[-m <max-iter>] maximum number of iterations (default 20)
[-L] do not use a lookup table (default YES)
[-f <corr-layers>] number of corrected layers (default 3)
[-r <train-rule>] training rule (default 0)
partr Train a SOM using several threads (UNIX/Linux)
In addition to somtr:
[-np <threads>] number of threads to use (default 2)

The SOM training creates a TS-SOM structure and organizes it. The result includes the structure of the network, stored with the given name <som>. In addition, the weight matrix of TS-SOM is stored in a data frame with the name <som>_W, as described in the figure above. If -cout <cldata> is specified, somtr also creates a classified data frame containing BMU classifications for each data record.

somtr command has lots of parameters, of which the first three are mostly used. Concerning the other parameters, if you are not sure about their use, then you will probably get the best result with their default values. Here are a few parameters described in more detail:

-d <data>
Data frame containing training data
-sout <som>
Name for the resulting TS-SOM structure and weight matrix.
-l <layers>
The number of layers in TS-SOM. Default value is 3
-D <dim>
The dimension of the TS-SOM. This defines the dimension of the TS-SOM i.e. the dimension of the SOM in each layer of TS-SOM. The default value is 2
-t <type>
Type of topology: 0 = lattice, 1 = ring, 2 = tree-structured vector quantifier
-e <weightning>
Factor for weightning the neighbors of neurons during the training process
-c <stop-crit>
The stopping criteria is defined through quantifying the error
-m <max-iter>
The maximum number of the epochs when training one layer of TS-SOM
-L
The lookup table (references from data vectors to their BMUs) is used as default. You can skip that by setting this flag
-f <corr-layer>
The number of layers from which the lookup tables will be corrected. Larger value provides better results but slows down the training
-r <train-rule>
The training rule: 0 = vector quantization (VQ), 1 = spreading. The first rule tries to follow the distribution of the data, while the second rule tries to spread the neurons over data points as completely as possible

partr is a parallel version of TS-SOM training for UNIX/Linux, and it understands the same switches as the normal single threaded version. In addition, the used may specify the number of threads to use with -np. This number should be a power of 2 (1, 2, 4, 8, 16, ..., 256) and it is inefficient to exceed the number of processors in the used computer.

Example (ex5.1): Training data is created by preprocessing and a SOM is trained using it. In addition, the SOM is used for classification (see command somcl in section 5.1.2).

...
NDA> prepro -d boston -dout predata -e -n
NDA> somtr -d predata -sout som1 -l 4
...
NDA> somcl -d predata -s som1 -cout cld1


next up previous contents
Next: Classifying data by a Up: The Tree-Structured Self-Organizing Map Previous: The Tree-Structured Self-Organizing Map

Anssi Lensu
Tue Jul 23 11:58:18 EET DST 2002