next up previous contents
Next: Finding unique data records Up: Data reorganization Previous: Joining data frames

Indexing through a joining operation

 

joinind Store indexes from a joining operation
-k1 <key1> keys corresponding to data 1
-k2 <key2> keys corresponding to data 2
-cout <cldataout> target data
[-first] only the first matching pair is picked
[-full] classifying by the full joining

This operation creates a classified data frame containing the indexing for the results of a joining operation. The operation has two alternatives, how to create the classified data:

With the flag -full:
All data records in the first frame <key1> will have their own classes, in which the indexes refer to the matching data records in <key2>. Searching is performed over all records in both data frames. The classes are named with a running index beginning from `0'.

figure1064

Without the flag -full:
The joining operation finds the matching records and stores their indexes into two data fields. If the flag -first has been given, then only the first matching pair is picked for each record in data frame <key1>. The result is a classified data containing two classes with the same length. The first class is a running index to <key1> and the second class contains the indexes of the first matching records in <key2>. The names of the classes will be `0' and `1'.

figure1071

Example: SOM 1 has been trained with a data set describing baseball teams and SOM 2 with a data describing the players (hitters). The following script picks the identifiers of the teams from a selected neuron and finds all the players in those teams. Then all the neurons which include at least one of the team's players will be referred to by a classified data (see the baseball example).

...
# User has selected a neuron identified by $1
#
# 1. Pick teams and their identifiers from the select neuron
#
NDA> select neuronclass -cl grp.neuron_$1
NDA> pickrec -d teamKey -c neuronclass -dout selTeamKeys
#
# 2. Join teams to players based on teams' names and find their
#    identifiers. The names of the players are stored
#    in data frame /hitterTmp/hitterKey.
#
NDA> rm -fr /hitter/selHitterKey
NDA> join -k1 selTeamKeys -k2 /hitter/hitterTeamKey
     -d2 /hitter/hitterKey -dout /hitter/selHitterKey
#
# 3. The player names are mapped to a player data and matching
#    pairs are stored in a classified data /hitter/hitterRows
#    as (team_index, player_index) pairs.
#
NDA> joinind -k1 /hitter/selHitterKey -k2 /hitter/hitterKey
     -cout /hitter/hitterRows -first
#
# 4. Find neurons from the player SOM such that
#    it includes at least one player from the selected team
#
NDA> select /hitter/xxx -cl /hitter/hitterRows.1
# Select one layer from a SOM classification
NDA> somlayer -s  /hitter/som -f /hitter/somlayer
NDA> selcld -c /hitter/cld -expr '/hitterdir/somlayer'=2;
     -cout /hitter/layercld -empty
# Find the BMU for each indexes in /hitter/xxx (joined rows) and
# remove multiple indexes
NDA> findcl -c1 /hitter/layercld -c2 /hitter/xxx -cout /hitter/grp
NDA> uniqcl -c /hitter/grp -cout /hitter/groups
# The result /hitter/grp includes the indexes of the neurons
# that can be used as a group of the neurons.
...


next up previous contents
Next: Finding unique data records Up: Data reorganization Previous: Joining data frames

Anssi Lensu
Tue Jul 23 11:58:18 EET DST 2002