User Guide
Chapters
Table of Contents
Strings and Regular Expressions
Group Patterns

Pattern-Action Pairs



Pattern-Action Pairs

The fundamental QTAwk processing sequence is:

  1. QTAwk opens each input file and reads the file record by record,
  2. When each record is read, it is split into fields
  3. Each pattern expression is executed
  4. The associated action is executed for each pattern expression which evaluates to true.

A detailed account of the QTAwk processing sequence includes many other actions that are executed by QTAwk, but these four actions capture the essence of the basic processing loop.

The above basic processing loop may be altered for special cases as explained at the end of this section.

QTAwk recognizes utilities in the following format:

pattern { action }

The opening brace, {, of the action must be on the same line as the pattern. Patterns control the execution of actions. When a pattern matches a record, i.e., the pattern expression evaluates to a true value, the associated action is executed. Patterns consist of valid QTAwk expressions or regular expressions. The sequence operator acquires a special meaning in pattern expressions and loses its meaning as a sequence operator.

QTAwk follows the C practice in logical operations of considering a nonzero numeric value as true and a zero numeric value as false. This has been expanded in QTAwk for strings by considering the null string as false and any non-null string as true. When a logical operation is performed, the operation returns an integer value of one (1) for a true condition and an integer value of zero (0) for a false condition.

QTAwk Patterns

QTAwk recognizes the following types of pattern/action pairs:

Blank Pattern
{ action }
The pattern is assumed TRUE for every record and the action is executed for all records.
Blank Action
expression
The default action

{print;}

is executed for every record for which expression evaluates to TRUE.

Expression Pattern
expression { action }
The action is executed for each record for which expression evaluates to TRUE.
Regular Expression Pattern
/regular expression/ { action }
The actions are executed for each record for which the regular expression matches a string in the record (TRUE condition). The use of a regular expression in the pattern in this manner is equivalent to the pattern expression:

$0 ~~ /regular expression/ { action }

The regular expression may be specified explicitly as shown or specified by a variable with a regular expression value. For example, setting the variable, var_re, as:

var_re = /Replacement String/;

and specifying the pattern as:

var_re { action }

would be identical to:

/Replacement String/ { action }

The use of a variable has the advantage of being able to change to the value of the variable. Changing the variable to another regular expression gives the QTAwk utility the capability of dynamically changing patterns recognized.

Compound Pattern action
compound pattern { action }
The pattern combines regular expressions with the logical operators:
  • NOT, !
  • AND, &&
  • OR, ||
  • the bitwise operators:

  • AND, & ;li.OR, | ;li.XOR, @
  • the relational operators:

  • Less Than or Equal, <=
  • Less Than, <
  • Greater Than, >
  • Greater Than or Equal, >=
  • the equality operators:

  • Equality, ==
  • Not Equal, !=
  • and the matching operators:

  • Matches, ~~
  • No Match, !~
  • The action is executed for each record for which the compound pattern is TRUE.

    Range Pattern
    expression1,expression2 { action }
    The action is executed for the first record for which expression1 is TRUE and every record until expression2 evaluates TRUE. The range is inclusive. This illustrates the special meaning of the sequence operator in patterns.
    Predefined Pattern
    predefined pattern { action }
    The predefined patterns are:
    1. BEGIN
    2. INITIAL
    3. GROUP
    4. NOMATCH
    5. FINAL
    6. END
    7. function

    QTAwk Predefined Patterns

    QTAwk provides seven predefined patterns, all of which (except for the 'GROUP' pattern) require actions.

    There may be multiple predefined pattern-action pairs defined in an QTAwk utility. Each action is executed at the appropriate time in the order defined.

    The seven predefined patterns are:

    1. BEGIN
    2. INITIAL
    3. GROUP
    4. NOMATCH
    5. FINAL
    6. END
    7. function

    BEGIN Predefined Pattern

    The action(s) associated with the BEGIN pattern are executed once prior to opening the first input file. There may be multiple BEGIN { action } combinations. Each action is executed in the order in which it is specified.

    INITIAL Predefined Pattern

    The action(s) associated with the INITIAL (INITIALIZE) pattern are executed after each input file is opened and before the first record is read. There may be multiple INITIAL { action } combinations. Each action is executed in the order in which it is specified.

    GROUP Predefined Pattern

    the pattern associated with the GROUP pattern keyword may be any valid QTAwk expression. All expressions in a GROUP are evaluated when the GROUP is first matched against an input record. The result of the evaluation is converted to a regular expression for matching. If the result of evaluating a GROUP expression is an array, the entire array is used for matching at the current position in the GROUP, i.e., all elements of the array are converted to regular expressions and each is matched against the current input record.

    All consecutive GROUP/action pairs are grouped and the search for the regular expressions optimized over the group. Each expression of the GROUP may have a separate action associated with it. In this case the appropriate action is executed if the expression is matched on the current input record. If the action for a expression is not given, then the next action explicitly given is executed. If no action is given for the last expression of a GROUP, then the default action

    { print ; }

    is assigned to it. When one of the expressions of the GROUP is matched, the built-in variable, NG, is set equal to the number of the expression. The numbering of the expressions in the GROUP starts with one, 1.

    Thus, besides the form of the GROUP given above, the following two forms are available:

    GROUP expression1           
    GROUP expression2 { action }
    GROUP expression3           
    GROUP expression4 { action }
    

    or

    GROUP expression1           
    GROUP expression2 { action }
    GROUP expression3           
    GROUP expression4           
    

    There may be more than one GROUP of expression patterns. Any pattern not preceded with the GROUP keyword will cause a GROUP to be terminated. The occurrence of the GROUP keyword again will start a new GROUP and the numbering of the new group starts at one, 1.

    GROUP patterns are discussed in more detail in the section Group Patterns.

    NOMATCH Predefined Pattern

    The action(s) associated with the NOMATCH pattern are executed for each record for which no pattern is TRUE. There may be multiple NOMATCH { action } combinations. Each action is executed in the order in which it is specified.

    FINAL Predefined Pattern

    The actions associated with the FINAL (FINALIZE) pattern are executed after the last record of each input file has been read and before the file is closed. There may be multiple FINAL { action } combinations. Each action is executed in the order in which it is specified.

    END Predefined Pattern

    The action(s) associated with the END pattern are executed once after the last input file has been closed. There may be multiple END { action } combinations. Each action is executed in the order in which it is specified.

    function Predefined Pattern

    user-defined function that may be called like a built-in function.

    Altering The Basic Processing Loop

    The basic QTAwk processing loop is described in QTAwk Processing Sequence. There are two basic functions in the loop that may be influenced directly by the user:

    1. Reading each record and evaluating each pattern for a true condition, File Searching, and
    2. Splitting each record into fields when the record is read, Field Splitting.

    The first function of the basic loop may be bypassed using the File Searching process which delays reading individual records until a match to a specified regular expression or set of regular expressions is found.

    The second function of the basic loop, splitting individual records into fields, may be delayed until a field is needed. Splitting each record into fields can be time consuming and may not be necessary for each record.

    File Searching

    For some QTAwk utilities, the basic processing loop as outlined at the beginning of this chapter may be slower than necessary. If all actions of a utility are associated with regular expressions or if a certain record matching one or more regular expressions must be found before any actions are executed, then the process of reading all records and parsing into fields before executing pattern expressions can be slow. For this purpose QTAwk has two special built-in variables:

    FILE_SEARCH
    FILE_SEARCH_PAT

    When FILE_SEARCH is TRUE, the next record read will be the record matching a regular expression from FILE_SEARCH_PAT. If FILE_SEARCH is FALSE, the normal file input process is followed. The file search process may be turned on and off as necessary for a single input file in this manner.

    FILE_SEARCH_PAT is set by the user utility to one or more regular expressions against which records from the current input file are matched. FILE_SEARCH_PAT may be set to a single regular expression as a simple variable, e.g.,

    FILE_SEARCH_PAT = /test string/;

    or a singly dimensioned array, e.g.,

    FILE_SEARCH_PAT[1] = /test string 1/;
    FILE_SEARCH_PAT[2] = /test string 2/;
    FILE_SEARCH_PAT[3] = /test string 3/;
    FILE_SEARCH_PAT[4] = /test string 4/;

    or a multidimensioned array, e.g.,

    FILE_SEARCH_PAT[1][1] = /test string 1,1/;
    FILE_SEARCH_PAT[1][2] = /test string 1,2/;
    FILE_SEARCH_PAT[1][3] = /test string 1,3/;
    FILE_SEARCH_PAT[2][1] = /test string 2,1/;
    FILE_SEARCH_PAT[2][2] = /test string 2,2/;
    FILE_SEARCH_PAT[2][3] = /test string 2,3/;
    FILE_SEARCH_PAT[3][1] = /test string 3,1/;
    FILE_SEARCH_PAT[3][2] = /test string 3,2/;
    FILE_SEARCH_PAT[3][3] = /test string 3,3/;

    When FILE_SEARCH is TRUE, the current input file is scanned for a match to FILE_SEARCH_PAT. When a record is found matching a regular expression in FILE_SEARCH_PAT, the record is read, parsed into fields according to FS and each pattern expression executed. The associated actions for TRUE pattern expressions are executed. Note that the variables RS or RECLEN still determine the parsing of the input file into records.

    Under some circumstances, the above process can return in '$0' multiple records from the current input file. In searching the input file for a match with FILE_SEARCH_PAT, a match may span more than one record if the variable, SPAN_RECORDS, is TRUE. In this case, '$0' is set to the full set of records spanning the match to FILE_SEARCH_PAT and FNR is set to the record number of the last record in $0.

    If SPAN_RECORDS is FALSE, any matches to FILE_SEARCH_PAT are not allowed to span input records and '$0' will contain only a single record.

    The simple QTAwk utility, QTGrep.exp, will mimic the QTGrep program in searching for multiple regular expression and keywords and print them. Using the ability, via FILE_SEARCH, to process only those lines which match a desired pattern, speeds the processing of the file considerably.

    Another handy use of FILE_SEARCH is to use FILE_SEARCH_PAT to search for the first line in a range of records to be processed. When the first record is found using the file searching capabilities of QTAwk, FILE_SEARCH is set to false and the range of records is processed normally. After the last record in the range has been processed, FILE_SEARCH is set to true to continue the file search for the next cluster.

    Such a utility may resemble the one in the example utility.

    Note if FILE_SEARCH_PAT is not set, then it has the default value of the null string. A match against the null string will match null, or zero length, records (QTAwk silently replaces a null string regular expression pattern with the regular expression /^$/. Thus, the following very simple QTAwk utility will find all null records in a given file and print the corresponding record numbers. The FINAL action prints the current filename, the number of null records in the current file, the number of null records in all previous files and the current file and the total number of records in the current file. The END action prints the total number of null records in all files and the total number of records in all files.

    BEGIN {
    FILE_SEARCH = TRUE;
    }

    {
    i++;
    print FNR;
    }

    FINAL {
    ti += i;
    print FILENAME,"--",i,ti,FNR,"--";
    i = 0;
    }

    END {
    print "--",ti,NR,"--";
    }

    Field Splitting

    Not all QTAwk utilities reference the fields in a record or not all expressions in a utility reference a field in the input record. When this is true, the basic QTAwk processing loop can be altered to delay splitting the current input record into fields until a field is needed. The sample utility QTGrep.exp, scans a file for specified pattern(s) and prints the line or lines containing a pattern match. Since the utility does not reference any fields of any input records, splitting the input records into fields is a waste of time and can be turned off to save a little time.

    Turning off field splitting in QTAwk can be done in two ways:

    1. On the Command Line, with the option '-Wd'.
    2. In the utility, by setting the built-in variable DELAY_INPUT_PARSE to false.

    The first method, automatically sets the variable DELAY_INPUT_PARSE to FALSE.


    TOP
    User Guide
    Chapters
    Table of Contents
    Strings and Regular Expressions
    Group Patterns