next up previous contents
Next: Finding unique data records Up: Data reorganization Previous: Joining data frames

Indexing through a joining operation

 

tabular741

This operation creates an indexing for the results of the joining operation. The operation has two alternatives to create a classified data:

With the flag -full:
All the records in the first data <key1> will have their own classes in which the indexes refer to the matching records in the data <key2>. The searching is performed over all the records in both the data frames. The names of the classes will be a running index beginning from 0.

figure750

Without the flag -full:
The joining operation finds the matching records and stores their indexes into two data fields. If the flag -first has been given, then only the first matching pair is picked for each record in the data frame <key1>. The result is a classified data including two classes with the same length. The first class has a running indexing to the data <key1> and the second class includes indexes to the first matching records in the data <key2>. The names of the classes will be "0" and "1".

figure757

Example: The SOM 1 has been trained by the data about baseball teams and the SOM 2 by the data about players (hitters). The following script picks the identifiers of the teams from the selected neuron and finds all the players in those teams. Then all the neurons which include at least one of the layers will be referred by a classified data (see the baseball example).

...
# User's has been selected neuron identified by $1
#
# 1. Pick teams and their identifiers from the select neuron
#
NDA> select neuronclass -cl grp.neuron_$1
NDA> pickrec -d teamKey -c neuronclass -dout selTeamKeys
#
# 2. Join teams to players based on teams' names and find their
#    identifiers. The names of the players are stored
#    in the data frame /hitterTmp/hitterKey.
#
NDA> rm -fr /hitter/selHitterKey
NDA> join -k1 selTeamKeys -k2 /hitter/hitterTeamKey
     -d2 /hitter/hitterKey -dout /hitter/selHitterKey
#
# 3. The player names are mapped to a player data and matching
#    pairs are stored in a classified data /hitter/hitterRows
#    as (team_index,player_index) pairs.
#
NDA> joinind -k1 /hitter/selHitterKey -k2 /hitter/hitterKey
     -cout /hitter/hitterRows -first
#
# 4. Find neurons from the player SOM such that
#    it includes at least one player from the selected team
#
NDA> select /hitter/xxx -cl /hitter/hitterRows.1
#
# select one layer from a SOM classification
NDA> somlayer -s  /hitter/som -f /hitter/somlayer
NDA> selcld -c /hitter/cld -expr '/hitterdir/somlayer'=2;
     -cout /hitter/layercld -empty
#
# Find the BMU for each indexes in /hitter/xxx (joined rows) and
# remove multiple indexes
#
NDA> findcl -c1 /hitter/layercld -c2 /hitter/xxx -cout /hitter/grp
NDA> uniqcl -c /hitter/grp -cout /hitter/groups
#
# The result /hitter/grp includes the indexes of the neurons
# that can be used as a group of the neurons.
#
...



Erkki Hakkinen
Thu Sep 24 11:51:34 EET DST 1998