join | Fully join two data frames |
-d1 <data1> | source data 1 |
-d2 <data2> | source data 2 |
-k1 <key1> | keys corresponding to data 1 |
-k2 <key2> | keys corresponding to data 2 |
-dout <dataout> | target data |
ijoin | Join only the first occurrences |
-d1 <data1> | source data 1 |
-d2 <data2> | source data 2 |
-k1 <key1> | keys corresponding to data 1 |
-k2 <key2> | keys corresponding to data 2 |
-dout <dataout> | target data |
These two commands perform joining operations. The first operation makes a full join, and the second operation picks only the first pair of the keys which match the joining.
Example: These commands are useful when a data has a relational form, where clear `index' keys of the data records can be found. In such a case, the analysis can be made by using the following concept. If we have a data, which tells who has bought which product(s), and we want to profile customers, which are identified by their codes. The first data records can be summarized according to the customer code, for instance, by computing the average value of some fields for each customer. Then the calculated values can be added into the original order frame using join and previously located unique customer keys (see the baseball example).
... NDA> ls -fr orderData custNro custSize custCover # Summarize the data related to the customer number (custNro) NDA> select keyFr -f orderData.custNro NDA> uniq -d keyFr -cout custUniqs NDA> select custFlds -f orderData.custSize orderData.custCover NDA> clstat -d custFlds -c custUniqs -dout custData -avg ... # SOM analysis -> groups of neurons # -> binarize them into data2 ... # Bind the customer key to data2 by the key operation NDA> select custKeyOrg -f orderData.custNro NDA> clkey -d custKeyOrg -c custUniqs -dout custData NDA> select custKey -f custData.custNro ... # Join binarized groups to the original data NDA> cd .. NDA> join -k1 keyFr -d1 orderData -k2 custKey -d custData -dout data2