Statements specify the flow of control through a utility when it executes. A statement that contains expressions also computes values and/or alters the values stored in variables when the statement executes.
QTAwk has departed from Awk by using the C convention
of
using the semi-colon, ';', as a statement terminator. QTAwk
treats
newline characters as white space, nothing more. Comments are
introduced
by the symbol, '#', and continue to the next newline character. Thus
the
Awk practice of letting new-lines terminate some statements
cannot
be used under QTAwk. The Awk rules for terminating
statements
with the newline except under some conditions can now be forgotten. In QTAwk,
terminate all statements with a semi-colon, ';'.
QTAwk Keywords
The QTAwk keywords are:
The keywords cycle, deletea, local and
endfile
are new to QTAwk. The keywords switch, case
and
default have been appropriated from C with expanded
functionality
over C.
Statements
A statement can be one of the following:
break;
case expr_list:
continue;
cycle;
default:
delete variable[expr_list];
deletea variable;
do statement while ( expr_list );
endfile;
exit;
exit expression;
for ( expr_list ; expr_list ; expr_list ) statement
for ( variable1 in variable2 ) statement
if ( expr_list ) statement
if ( expr_list ) statement else statement
local variable;
local variable1 , variable2 , ...;
local variable1 = value;
local variable1 = value , variable2 = value , ...;
next;
return;
return expr_list;
switch ( expr_list ) statement
while ( expr_list ) statement
;
expression;
expression , expression , expression , ...;
{ statement statement statement ... }
QTAwk provides braces for grouping statements to form
compound
statements. Various keywords are available for controlling the logical
flow
of statement execution and for looping over statements multiple times.
cycle and next
The cycle and next statements allow the user to control the execution of the QTAwk outer loop which reads records from the current input file and compares them against the patterns. Both statements, restart the pattern matching.
The next statement causes the next input record to be read before restarting the outer pattern matching loop with the first pattern-action pair.
The cycle statement may use the current input record or the next input record for restarting the outer pattern matching loop. As each input record is read from the current input file, the built-in variable CYCLE_COUNT is set to one. The 'cycle' statement increments the numeric value of CYCLE_COUNT by one and compares the new value to the numeric value of the built-in variable MAX_CYCLE. One of two actions is taken depending on the result of this comparison:
The default value of MAX_CYCLE is 100. Both CYCLE_COUNT and MAX_CYCLE are built-in variables and may be set by the user's utility. Setting MAX_CYCLE is useful to control the number of iterations possible on a record. Setting MAX_CYCLE to 1 would make the cycle and 'next' keywords identical.
If the value of CYCLE_COUNT is altered by the user's utility, care should be taken to prevent the possibility of the utility entering a loop from which it cannot exit.
The cycle statement is useful when it is necessary to process the current input record through the outer pattern match loop more than once. The following utility is a trivial example of one such use. This utility will print each record with the record number multiple times. The number of times is determined by the value assigned MAX_CYCLE in the BEGIN action.
BEGIN {
MAX_CYCLE = 10;
}
{
print FNR,$0;
cycle;
}
The "next" record read for both the next; and cycle;
statements depends on the value of the built-in variable FILE_SEARCH. If FILE_SEARCH is
false,
then the next physical record is read. If FILE_SEARCH is true, the next
record
is the record(s) containing a string matching a pattern in FILE_SEARCH_PAT.
delete and deletea
The delete and deletea statements allow the user to delete individual elements of an array or an entire array respectively. The form of the delete and deletea statements are:
delete A[expr_list];
and
deletea A;
The first form will delete the element of array A referenced by the subscript determined by 'expr_list'. The second form will delete the entire array. Note that for singly dimensioned arrays, the deletea statement is equivalent to the statement:
for ( j in A ) delete A[j];
The use of the deletea statement is encouraged for simplicity and speed of execution. The delete statement may be used for arrays of any dimension. However, for arrays with dimension greater than 2, the elements of the array are not deleted, but simply initialized to zero and the null string. This behavior has to do with the structure of arrays and the 'holes' which could be left by deleting elements. For singly dimensioned arrays, there is no problem, since there can be no 'hole' left by deleting an element. For example consider the singly dimensioned array:
A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
If the array element A[5] is deleted
A[1] A[2] A[3] A[4] ____ A[6] A[7] A[8] A[9]
Then the remaining elements 'shift' to fill the 'hole'.
A[1] A[2] A[3] A[4] A[6] A[7] A[8] A[9]
For two-dimensional arrays a complication arises in trying to fill the 'hole' left by deleting an array element.
A[1][1] A[1][2] A[1][3] A[1][4] A[1][5] A[1][6]
A[2][1] A[2][2] A[2][3] A[2][4] A[2][5] A[2][6]
A[3][1] A[3][2] A[3][3] A[3][4] A[3][5] A[3][6]
A[4][1] A[4][2] A[4][3] A[4][4] A[4][5] A[4][6]
A[5][1] A[5][2] A[5][3] A[5][4] A[5][5] A[5][6]
A[6][1] A[6][2] A[6][3] A[6][4] A[6][6] A[6][6]
If element A[4][4] is deleted, then we have the 'hole':
A[1][1] A[1][2] A[1][3] A[1][4] A[1][5] A[1][6]
A[2][1] A[2][2] A[2][3] A[2][4] A[2][5] A[2][6]
A[3][1] A[3][2] A[3][3] A[3][4] A[3][5] A[3][6]
A[4][1] A[4][2] A[4][3] _______ A[4][5] A[4][6]
A[5][1] A[5][2] A[5][3] A[5][4] A[5][5] A[5][6]
A[6][1] A[6][2] A[6][3] A[6][4] A[6][6] A[6][6]
In trying to fill the 'hole', we have a choice of shifting the elements below the deleted element up to fill the 'hole', column priority, or shifting the elements to the right of the deleted element to fill the 'hole', row priority. In QTAwk, row priority is used in filling the 'hole':
A[1][1] A[1][2] A[1][3] A[1][4] A[1][5] A[1][6]
A[2][1] A[2][2] A[2][3] A[2][4] A[2][5] A[2][6]
A[3][1] A[3][2] A[3][3] A[3][4] A[3][5] A[3][6]
A[4][1] A[4][2] A[4][3] A[4][5] A[4][6]
A[5][1] A[5][2] A[5][3] A[5][4] A[5][5] A[5][6]
A[6][1] A[6][2] A[6][3] A[6][4] A[6][6] A[6][6]
For arrays of higher dimensions the situation is even more complicated. Not only do elements have to be "shifted", but elements in the array will have to be discarded to do so. For example, if A is a 3x3x3 array and element A[2][2][2] is deleted, then element A[2][2][3], if it existed, would also be deleted by shifting other elements to fill the 'hole'. QTAwk will in this case initialize the element A[2][2][2] to zero and the null string rather than delete the element and lose other elements. Thus, the delete statement only truly deletes elements for one and two dimensional arrays.
The deletea statement, however, works on arrays of any dimension. For multidimensional arrays, the deletea would be equivalent to nested for statements. For example, if the delete statement truly deleted elements of a three dimensional array, then the deletea statement could be imagined as equivalent to:
for ( i in A )
for ( j in A[i] )
for ( k in A[i][j] ) delete A[i][j][k]
The if and else keywords provide for executing one of possibly two statements conditioned upon the TRUE or FALSE value of an expr_list. The form of the if/else statement is:
if ( expr_list ) statement1
or
if ( expr_list ) statement1 else statement2
If expr_list when evaluated, produces a TRUE value then statement1
is
executed. If the expr_list produces a FALSE value, then for the second
form,
statement2 is executed.
switch, case, default
QTAwk includes an expanded form of the C switch/case statements. In C, the switch/case statements must be of the form:
switch ( expr_list ) {
case constant1: statement
case constant2: statement
case constant3: statement
case constant4: statement
default: statement
}
In the C language, the expr_list of the switch statement must evaluate to an integral value and the 'constant1', 'constant2', 'constant3', and 'constant4' case labels, must be compile-time integral constant values.
In QTAwk, the 'switch' expr_list may evaluate to any valid value and the case labels may be any valid QTAwk expression or expr_list:
switch ( expr_list ) {
case expr_list1: statement
case expr_list2: statement
case expr_list3: statement
case expr_list4: statement
default: statement
}
The expr_lists of the case statements are evaluated in turn at execution time. The resultant value is checked against the value of the expr_list of the switch statement using the following logic.
if ( cexpr is a regular expression ) logical_value = sexpr ~~ cexpr;
else logical_value = sexpr == cexpr;
if ( logical_value ) execute case statement
where cexpr is the value of the case expr_list and sexpr is the value of the switch statement expr_list. logical_value is 0 or 1. If logical_value is 1, then the statements following the case label are executed. Thus if cexpr is a regular expression, a match operation is performed. If cexpr is a string, a string comparison is performed. If cexpr is a numeric, a numerical comparison is performed. It is possible to have case statements with differing types of expr_list values in the same switch statement and the proper comparison is made.
In addition a given case expr_list can evaluate to different types at different times.
Once a true value is returned by a case statement comparison, the execution falls through from case to case with no further comparisons made. The fall through of execution is broken by the use of the break statement as in C.
Note that the expr_list of a case statement is evaluated at execution time and it is possible for some case expr_lists to never be evaluated. Thus, side effects from the evaluation of case expr_lists should not be relied upon. This is particularly true where execution falls through from one case statement to the next.
If the expr_list of a case statement evaluates to a regular expression, then two built-in variables are set when the match operation is performed CLENGTH and CSTART. CLENGTH is set to the length of the matching string found (or zero) and CSTART is set to the starting position of the matching string found (or zero). CLENGTH and CSTART are completely analogous to RLENGTH and RSTART set for the match function and MLENGTH and MSTART for the match operators, ~~ and !~ .
The default keyword is provided in analogy to C. The
statements
following the default statement are executed if the switch
expr_list matches no case label. The default
statement
may be combined with other case statements. It need not be
the
last statement as shown.
Loops
QTAwk has four forms of loop control statements:
for ( expr_list1 ; expr_list2 ; expr_list3 ) statement
for ( var in array ) statement
while ( expr_list ) statement
do statement while ( expr_list );
for
The for statement has two forms:
In the first form the following sequence of operations are performed:
The second form of the for statement is used to loop through all indices of an array. Arrays in QTAwk are associative arrays, i.e., they consist of pairs of indices and values. Since the arrays are sparse, i.e., not all consecutive index values need exist in a given array, it can be difficult to loop through all index values using the first form of the for loop above. The second form solves that problem by assigning 'var' successive index values on each loop iteration.
Thus, if an array had the index and element values:
AR[1] = 556;
AR[3] = 335;
AR["state"] = "Washington";
AR[9] = "string value";
AR[20] = "population count";
AR[30] = "density";
AR["county"] = "Whitman";
AR[5] = 56.445;
AR["city"] = "Pullman";
AR["zip"] = 99111;
AR[2] = "nonsense";
Then the for loop:
for ( index in AR ) print index,AR[index]
would output the following:
1 556
2 nonsense
3 335
5 56.445
9 string value
20 population count
30 density
city Pullman
county Whitman
state Washington
zip 99111
The second form may also be used for multidimensional arrays:
for ( var in array[s_expr_list]...[s_expr_list] ) statement
For each subscript in the next higher index level in the array reference, var is set to the index value and 'statement' is executed. 'statement' may be a compound statement. For a multidimensional array, the second form may be used to loop sequentially through the indices of the next higher index level. Thus for a two dimensional array:
for ( i in A )
for ( j in A[i] )
will loop through the indices in the array in row order.
while
The while statement has the form:
while ( expr_list ) statement
the expr_list is evaluated and if TRUE 'stmt' is executed and
expr_list
is re-evaluated. This cycle continues until expr_list evaluates to
FALSE,
at which point the cycle is terminated and execution resumes with the
utility
after 'stmt'.
do/while
The form of the do/while statement is:
do statement while ( expr_list );
'statement' is executed, expr_list evaluated and if TRUE 'statement'
is
executed again else the loop is terminated. Note that 'statement' is
executed
at least once.
local
The local keyword is used to define variables within a compound statement that are local to the compound statement and that disappear when the compound statement is exited. The local keyword may be used within any compound statement, but is especially useful in user-defined functions as described later. Variables defined with the local keyword may be assigned an initial value in the statement and multiple variables may be defined with a single statement. If a variable is not assigned an initial value, it is initialized to zero and the null string just as global variables are initialized.
Thus:
local i, j = 12, k = substr(str,5);
will define three variables local to the enclosing compound statement:
Local variables initialized explicitly in local statements may be initialized to constants, the values of global variables, values returned by built-in functions, values returned by user-defined functions or previously defined local variables. If the value is set to that of a previously defined local variable, the variable may not be defined in the same local statement. Thus:
local k = 5;
local j = k;
is correct, but
local k = 5, j = k;
is not. In the latter case QTAwk will quietly assume that
the
k, to which j is assigned, is a global variable.
endfile
The endfile keyword causes the utility to behave as if the
end
of the current input file has been reached. Any FINAL actions
are executed, if any input files remain to be processed from the
command
line, the next is opened for processing. If no further input files
remain
to be processed, any END actions are executed.
break
This keyword will terminate the execution of the enclosing while,
for, do/while loop or break execution in cascaded case
statements.
continue
This keyword will cause execution to jump to just after the last
statement
in the loop body and execute the next iteration of the enclosing loop.
The
loop may be any for, while or do/while
loop.
exit
This statement causes the utility to behave as if the end of the current input file had been reached. Any further input files specified are ignored. If there are any FINAL or END actions, they are executed.
If encountered in a FINAL action, the action is terminated, any further input files are ignored and any END actions are executed.
If encountered in an END action, the execution of the action is terminated and utility execution is terminated.
The optional expr_list is evaluated and the resultant value returned
to
the operating system upon termination by QTAwk as the exit
status. If no expr_list is present, or no 'exit' statement encountered,
QTAwk
returns a value of zero for the exit status.
return
This statement will cause execution to return from a user defined function. If the optional expr_list is present, it is evaluated and the resultant value returned as the functional value. The optional expr_list may evalute to a scalar value or an array. If an array, then the array is the return value for the function.