The fundamental QTAwk processing sequence is:
A detailed account of the QTAwk processing sequence includes many other actions that are executed by QTAwk, but these four actions capture the essence of the basic processing loop.
The above basic processing loop may be altered for special cases as explained at the end of this section.
QTAwk recognizes utilities in the following format:
pattern { action }
The opening brace, {, of the action must be on the same line as the pattern. Patterns control the execution of actions. When a pattern matches a record, i.e., the pattern expression evaluates to a true value, the associated action is executed. Patterns consist of valid QTAwk expressions or regular expressions. The sequence operator acquires a special meaning in pattern expressions and loses its meaning as a sequence operator.
QTAwk follows the C practice in logical operations of
considering
a nonzero numeric value as true and a zero numeric value as false. This
has been expanded in QTAwk for strings by considering the null
string
as false and any non-null string as true. When a logical operation is
performed,
the operation returns an integer value of one (1) for a true condition
and
an integer value of zero (0) for a false condition.
QTAwk Patterns
QTAwk recognizes the following types of pattern/action pairs:
{print;}
is executed for every record for which expression evaluates to TRUE.
$0 ~~ /regular expression/ { action }
The regular expression may be specified explicitly as shown or specified by a variable with a regular expression value. For example, setting the variable, var_re, as:
var_re = /Replacement String/;
and specifying the pattern as:
var_re { action }
would be identical to:
/Replacement String/ { action }
The use of a variable has the advantage of being able to change to the value of the variable. Changing the variable to another regular expression gives the QTAwk utility the capability of dynamically changing patterns recognized.
the bitwise operators:
the relational operators:
the equality operators:
and the matching operators:
The action is executed for each record for which the compound pattern is TRUE.
QTAwk provides seven predefined patterns, all of which (except for the 'GROUP' pattern) require actions.
There may be multiple predefined pattern-action pairs defined in an QTAwk utility. Each action is executed at the appropriate time in the order defined.
The seven predefined patterns are:
The action(s) associated with the BEGIN pattern are executed once
prior
to opening the first input file. There may be multiple BEGIN {
action
} combinations. Each action is executed in the order in which it
is
specified.
INITIAL Predefined Pattern
The action(s) associated with the INITIAL (INITIALIZE) pattern are
executed
after each input file is opened and before the first record is read.
There
may be multiple INITIAL { action } combinations. Each action
is
executed in the order in which it is specified.
GROUP Predefined Pattern
the pattern associated with the GROUP pattern keyword may be any valid QTAwk expression. All expressions in a GROUP are evaluated when the GROUP is first matched against an input record. The result of the evaluation is converted to a regular expression for matching. If the result of evaluating a GROUP expression is an array, the entire array is used for matching at the current position in the GROUP, i.e., all elements of the array are converted to regular expressions and each is matched against the current input record.
All consecutive GROUP/action pairs are grouped and the search for the regular expressions optimized over the group. Each expression of the GROUP may have a separate action associated with it. In this case the appropriate action is executed if the expression is matched on the current input record. If the action for a expression is not given, then the next action explicitly given is executed. If no action is given for the last expression of a GROUP, then the default action
{ print ; }
is assigned to it. When one of the expressions of the GROUP is matched, the built-in variable, NG, is set equal to the number of the expression. The numbering of the expressions in the GROUP starts with one, 1.
Thus, besides the form of the GROUP given above, the following two forms are available:
GROUP expression1
GROUP expression2 { action }
GROUP expression3
GROUP expression4 { action }
or
GROUP expression1
GROUP expression2 { action }
GROUP expression3
GROUP expression4
There may be more than one GROUP of expression patterns. Any pattern not preceded with the GROUP keyword will cause a GROUP to be terminated. The occurrence of the GROUP keyword again will start a new GROUP and the numbering of the new group starts at one, 1.
GROUP patterns are discussed in more detail in the section Group Patterns.
NOMATCH Predefined Pattern
The action(s) associated with the NOMATCH pattern are executed for
each
record for which no pattern is TRUE. There may be multiple NOMATCH
{
action } combinations. Each action is executed in the order in
which
it is specified.
FINAL Predefined Pattern
The actions associated with the FINAL (FINALIZE) pattern are
executed
after the last record of each input file has been read and before the
file
is closed. There may be multiple FINAL { action }
combinations. Each action is executed in the order in which it is
specified.
END Predefined Pattern
The action(s) associated with the END pattern are executed once
after
the last input file has been closed. There may be multiple END {
action
} combinations. Each action is executed in the order in which it
is
specified.
function Predefined Pattern
user-defined function that may be
called
like a built-in function.
Altering The Basic Processing Loop
The basic QTAwk processing loop is described in QTAwk Processing Sequence. There are two basic functions in the loop that may be influenced directly by the user:
The first function of the basic loop may be bypassed using the File Searching process which delays reading individual records until a match to a specified regular expression or set of regular expressions is found.
The second function of the basic loop, splitting individual records
into
fields, may be delayed until a field is needed. Splitting each record
into
fields can be time consuming and may not be necessary for each record.
File Searching
For some QTAwk utilities, the basic processing loop as outlined at the beginning of this chapter may be slower than necessary. If all actions of a utility are associated with regular expressions or if a certain record matching one or more regular expressions must be found before any actions are executed, then the process of reading all records and parsing into fields before executing pattern expressions can be slow. For this purpose QTAwk has two special built-in variables:
FILE_SEARCH
FILE_SEARCH_PAT
When FILE_SEARCH is TRUE, the next record read will be the record matching a regular expression from FILE_SEARCH_PAT. If FILE_SEARCH is FALSE, the normal file input process is followed. The file search process may be turned on and off as necessary for a single input file in this manner.
FILE_SEARCH_PAT is set by the user utility to one or more regular expressions against which records from the current input file are matched. FILE_SEARCH_PAT may be set to a single regular expression as a simple variable, e.g.,
FILE_SEARCH_PAT = /test string/;
or a singly dimensioned array, e.g.,
FILE_SEARCH_PAT[1] = /test string 1/;
FILE_SEARCH_PAT[2] = /test string 2/;
FILE_SEARCH_PAT[3] = /test string 3/;
FILE_SEARCH_PAT[4] = /test string 4/;
or a multidimensioned array, e.g.,
FILE_SEARCH_PAT[1][1] = /test string 1,1/;
FILE_SEARCH_PAT[1][2] = /test string 1,2/;
FILE_SEARCH_PAT[1][3] = /test string 1,3/;
FILE_SEARCH_PAT[2][1] = /test string 2,1/;
FILE_SEARCH_PAT[2][2] = /test string 2,2/;
FILE_SEARCH_PAT[2][3] = /test string 2,3/;
FILE_SEARCH_PAT[3][1] = /test string 3,1/;
FILE_SEARCH_PAT[3][2] = /test string 3,2/;
FILE_SEARCH_PAT[3][3] = /test string 3,3/;
When FILE_SEARCH is TRUE, the current input file is scanned for a match to FILE_SEARCH_PAT. When a record is found matching a regular expression in FILE_SEARCH_PAT, the record is read, parsed into fields according to FS and each pattern expression executed. The associated actions for TRUE pattern expressions are executed. Note that the variables RS or RECLEN still determine the parsing of the input file into records.
Under some circumstances, the above process can return in '$0' multiple records from the current input file. In searching the input file for a match with FILE_SEARCH_PAT, a match may span more than one record if the variable, SPAN_RECORDS, is TRUE. In this case, '$0' is set to the full set of records spanning the match to FILE_SEARCH_PAT and FNR is set to the record number of the last record in $0.
If SPAN_RECORDS is FALSE, any matches to FILE_SEARCH_PAT are not allowed to span input records and '$0' will contain only a single record.
The simple QTAwk utility, QTGrep.exp, will mimic the QTGrep program in searching for multiple regular expression and keywords and print them. Using the ability, via FILE_SEARCH, to process only those lines which match a desired pattern, speeds the processing of the file considerably.
Another handy use of FILE_SEARCH is to use FILE_SEARCH_PAT to search for the first line in a range of records to be processed. When the first record is found using the file searching capabilities of QTAwk, FILE_SEARCH is set to false and the range of records is processed normally. After the last record in the range has been processed, FILE_SEARCH is set to true to continue the file search for the next cluster.
Such a utility may resemble the one in the example utility.
Note if FILE_SEARCH_PAT is not set, then it has the default value of the null string. A match against the null string will match null, or zero length, records (QTAwk silently replaces a null string regular expression pattern with the regular expression /^$/. Thus, the following very simple QTAwk utility will find all null records in a given file and print the corresponding record numbers. The FINAL action prints the current filename, the number of null records in the current file, the number of null records in all previous files and the current file and the total number of records in the current file. The END action prints the total number of null records in all files and the total number of records in all files.
BEGIN {
FILE_SEARCH = TRUE;
}
{
i++;
print FNR;
}
FINAL {
ti += i;
print FILENAME,"--",i,ti,FNR,"--";
i = 0;
}
END {
print "--",ti,NR,"--";
}
Not all QTAwk utilities reference the fields in a record or not all expressions in a utility reference a field in the input record. When this is true, the basic QTAwk processing loop can be altered to delay splitting the current input record into fields until a field is needed. The sample utility QTGrep.exp, scans a file for specified pattern(s) and prints the line or lines containing a pattern match. Since the utility does not reference any fields of any input records, splitting the input records into fields is a waste of time and can be turned off to save a little time.
Turning off field splitting in QTAwk can be done in two ways:
The first method, automatically sets the variable DELAY_INPUT_PARSE to FALSE.