next up previous contents
Next: User's reference Up: Basics Previous: Basics   Contents

Subsections

Basics of data management in the NDA

In this section the basic data types and interface to them is introduced. The interface part of this section describes, how operations can refer to data through names, which identify the actual stored data structures. In addition to names, directories can be used to help data organization. The directory structure is described in section 1.2.

Basic data types

The NDA has only a few basic data types, with which all data management is defined:

Data field (field, f):
The basis of all data storage is data field, which corresponds to a data vector. The values in the vector must have the same data type, which can be integer, floating point number or character string.

Data frame (data, d):
Data frame can be used to store data in matrix form. It is a collection of data fields, which are required to have the same length, but can contain different data types. One data record is a selection of field values having the same row index.

\begin{figure}\centerline{\hbox{
\psfig{figure=dataframe.ps}
}}
\end{figure}

Classified data frame (cldata, c):
Classified data frame can be used for data classification. This type of frame contains data fields of integers, but they can have different lengths. Each field in the frame corresponds to one class and the integer values in the fields are references to data records, which have been chosen to that class by some classifier. Normally, the integers are indexes referring to data records stored in some data frame.



\begin{figure}\centerline{\hbox{
\psfig{figure=cldatafr.ps}
}}
\end{figure}


Structures:
There are a few structural data items, which cannot be easily or efficiently converted into these three basic types. Thus, the system is able to store any data structures, but handling operations must be defined separately for them. Here are two typical examples of structures:
TS-SOM (som, s):
The structure of TS-SOM is one specific structure type implemented in the current version. However, note that the weights of the SOMs are not stored as a matrix. Instead, a separate data frame is used.
Graphics (gr):
Properties of the graphical presentations are stored in special structures.

Frames (data and classified data) actually contain references to fields. One field can be referred to by several frames, for instance, when the select (see section 2.9) command has been used. This should be taken into account when frames and fields are deleted. When a frame is deleted, typically all of its fields are also deleted, i.e., if they are not referred to by other frames. There are also operations to unlink fields from frames, and to delete a field including all the references from frames.

The order of fields has not been explicitly defined and cannot be easily changed. However, some operations suppose that fields are ordered. The operation select creates references to fields in the given order to the new frame. Correspondingly the listing operations show the fields in that order, in which they have been chosen or created.


Naming of data items

The basic idea is to store all data items in the Name space, which can be easily accessed by the commands. In addition to the use of simple names, the Name space can be divided into parts using directories, which are analogic to directories used with file systems. Also, frames can be used to build groups of data fields, if their length is identical.

Directories

A directory is seen as any other data structure in the Name space. Analogically to file systems, each directory (excluding the root) has a root entry called `.' and a link to parent directory called `..'. The separator in the names between the levels of directories is `/' and the root of the whole directory tree is called `/'. Thus, a truly distinguishable path can be defined by collecting all subdirectory references and preceding it with `/', for instance `/lev1/lev2/'.

Note that operations do not automatically create new subdirectories for resulting data structures. That is, a directory must be created before it can be used to store new named entries. However, you can refer to data structures in different directories by using absolute or relative paths. Current directory can be changed with the cd command.

Please note, that the directory structure of NDA is not the same as the one in the host computer. In NDA all data items are only stored in the main (or virtual) memory of the computer, and file system files have to be loaded into the Name space. Also the results are lost, if they are not saved into a file before exiting the NDA.

Frames

Fields can be stored directly in directories or within frames. At the directory level they can be referred to as any other entries, but at the frame level fields can only be referred to through the frames. The same field name can be used in several frames residing within the same directory. Otherwise, names within each subdirectory or each single frame must be unambiguos.

Items inside frames are always referred to with <framename>.<itemname> and, thus, they cannot be seen directly in any directory. The separator between the name of the frame and its items is always `.'. If a frame contains other frames, which can further include data items, fields can be accessed via the whole name, for instance, `frm1.frm2.item1'.

Rules of naming

Names should start with an alphabetical character a - z, A - Z and may, at least, contain alphanumeric characters. Names are case sensitive. Some special characters, such as `#', `$', `$\backslash$', `.', `/', `(', `)', `{', `}', `[', `]', `"' and `'', may not be used, and characters like `:', `;', `,', `_', `@', `+', `-', `*', `=', `<', `>' and `^' should not be used in order to avoid confusion.

Commands, switches, scripts and script parameters

Commands can be run directly using the command line of ndashell or script files. Script files are normal text files containing one command per line. `#' can be used to add comments to scripts. In each line, all characters after a `#' are omitted while executing commands. A script file can be run directly by writing its name on the command line. You can specify parameters to scripts on the command line and refer to these specified values with $1 ...$9, ${10} ...${nn} in scripts.

Parameter substitution can be temporarily postponed by adding a `$\backslash$' character on the command line. Everything after the backslash will not be substituted, unless an enclosing `$\backslash$' is encountered. That is, all characters between two backslashes will be taken literally in the first evaluation of the command line. Some commands, such as if and while, re-evaluate the specified command line before executing it. This can be useful while executing a command for all script parameters (see section 11.2).

Normally commands are echoed before execution, but this can be disabled either using echo -off (see section 11.3) or by starting each command line with a `@'. `@' needs to be the first character on that line and even no whitespace characters may precede it! echo -off is only useful in scripts, because echoing of command is restored, when the execution of a script ends. Error messages are displayed regardless of the echo setting.

All command parameters have to be written onto the same line and switches cannot be combined after a single `-'. For example, ls -tup (see section 2.2) will try to show item -tup, and rm -rfr boston (see section 2.3) results in an Invalid parameter(s) for command error. Instead, you should use ls -t -u -p, for example. Most of the other command also produce an error message, if they are given unexpected switches or illegal data types.

ndashell for UNIX$^{T\!M}$ itself recognizes four special switches:



-d <directory>
-f <cmdfile> [<parameters>]
-quit
-batch

If a directory is specified with -d the current working directory in the UNIX filesystem is first changed into this directory. If a command (or script) file is specified with -f, it is run immediately after ndashell has initialized itself. Parameters can also be specified, if applicable. If -quit is specified, ndashell exits right after the execution of the script has finished. If -batch is specified, the XView graphical user interface will not be started at all.

The Windows version only accepts the name of the command file to be executed on the command line. In order to change the current UNIX or Windows filesystem directory while the NDA is running, cwd command (see section 11.1) can be used.

If an error is encountered while running a script, the execution of all running scripts is terminated immediately, unless stop -off (see section 11.6) has been executed earlier.


Definition of macros within script files

Several macros or scripts can be defined within a single script file and these macros can be called by specifying the name of the macro in a script or the command line. To explicitly define a new macro the programmer can start the definition with BeginMacro macro_name, put the command lines of the macro after that, and end the definition with EndMacro.

To define an implicit block of command to be executed within a command, Begin macro_name ... EndBegin can be used ``within the command line''. Of course, the commands to be executed have to be specified on separate lines, but the original command line may continue after EndBegin. As an example we present a script that calculates the value of $\log_{10}(x)$ for $x = 1, 2, 3, 4, \ldots, 10$ and outputs the results in the form log(x) is n.

@echo -off
expr -fout x -expr 1;
while -expr 'x' <= 10; -cmd Begin evalLog
  expr -fout n -expr log('x');
  echo  log(${x}) is ${n}
EndBegin: -loop expr -fout x -expr 'x' + 1;:

Or alternatively, the same execution could be obtained with:

@echo -off
BeginMacro evalLog
  expr -fout n -expr log($1);
  echo  log($1) is ${n}
EndMacro

expr -fout x -expr 1;
while -expr 'x' <= 10; -cmd evalLog \${x}\: \
 -loop expr -fout x -expr 'x' + 1;:


Variables and parameter substitution

Variables and script parameters are denoted in a similar way as in UNIX command interpreters (shells). A $-sign should precede either a single number or a brace-delimited structure, such as {nn}, {name} or {name[${ind}]}. Numbers refer to parameters passed to the currently running script file and names refer to data (or names of data fields within frames) stored in the name space.

To be able to refer to a certain data value within a data field, indexes can be used. Indexes can also be used to get the name of a certain field from a data frame or classified data. Indexes are delimited with brackets. An index should be a numeric constant or a variable.

The substitution is performed recursively and therefore the name part of
${name[${ind}]} can be replaced with $1, for example.

Examples:

Parameter reference Description
$2 2nd parameter of the script invocation command line
${12} 12th parameter on the command line
${ind} First (index 0) value of data field (or variable) ind
${${ind}} indth parameter on the command line
${fld[${ind}]} indth value of field fld
${frm[${ind}]} Name of the indth field of frm
${frm.f1[${ind}]} indth value of field frm.f1
${$1[${ind}]} indth value of field (or indth field name of frame) $1

The number of available parameters can be obtained with $?, and $* can be used to put all specified parameters into the new command line.

For an additional example, see the while command (section 11.2).


next up previous contents
Next: User's reference Up: Basics Previous: Basics   Contents
Anssi Lensu 2006-02-23