next up previous contents
Next: Indexing through a joining Up: Data reorganization Previous: Transposing a data frame

Joining data frames

  

join Fully join two data frames
-d1 <data1> source data 1
-d2 <data2> source data 2
-k1 <key1> keys corresponding to data 1
-k2 <key2> keys corresponding to data 2
-dout <dataout> target data
ijoin Join only the first occurrences
-d1 <data1> source data 1
-d2 <data2> source data 2
-k1 <key1> keys corresponding to data 1
-k2 <key2> keys corresponding to data 2
-dout <dataout> target data

These two commands perform joining operations. The first operation makes a full join, and the second operation picks only the first pair of the keys which match the joining.

Example: These commands are useful when a data has a relational form, where clear `index' keys of the data records can be found. In such a case, the analysis can be made by using the following concept. If we have a data, which tells who has bought which product(s), and we want to profile customers, which are identified by their codes. The first data records can be summarized according to the customer code, for instance, by computing the average value of some fields for each customer. Then the calculated values can be added into the original order frame using join and previously located unique customer keys (see the baseball example).

...
NDA> ls -fr orderData
 custNro
 custSize
 custCover
# Summarize the data related to the customer number (custNro)
NDA> select keyFr -f orderData.custNro
NDA> uniq -d keyFr -cout custUniqs
NDA> select custFlds -f orderData.custSize orderData.custCover
NDA> clstat -d custFlds -c custUniqs -dout custData -avg
...
# SOM analysis -> groups of neurons
#              -> binarize them into data2
...
# Bind the customer key to data2 by the key operation
NDA> select custKeyOrg -f orderData.custNro
NDA> clkey -d custKeyOrg -c custUniqs -dout custData
NDA> select custKey -f custData.custNro
...
# Join binarized groups to the original data
NDA> cd ..
NDA> join -k1 keyFr -d1 orderData -k2 custKey 
     -d custData -dout data2


next up previous contents
Next: Indexing through a joining Up: Data reorganization Previous: Transposing a data frame

Anssi Lensu
Thu May 17 15:00:44 EET DST 2001