Strings and regular expressions in QTAwk are very similar,
yet
very different. Regular expressions can be used wherever strings are
used
and strings may be used in most cases where a regular expression may be
used.
Regular Expression and String Translation
Regular expressions and strings used as regular expressions are turned into an internal form for scanning the target string for a match. For regular expressions this process of conversion into the internal form is done once, when the regular expression is first used for matching. For strings the process is done every time the string is used as a regular expression.
The process of conversion into the internal form can be time consuming if done repeatedly. The judicious use of strings and regular expressions can give both flexibility and speed. By using regular expressions in those places where the content of the regular expression will not change after the first use, the speed of a single conversion can be attained. By using strings in those places where a regular expression is called for, e.g., the first argument of the sub, gsub or gensub functions and the right hand expression for the match operators, the flexibility of dynamically changing expressions can be gained at the expense of speed.
QTAwk has a form of regular expression that fits between the
use
of regular expression constants and string constants, array variables. When an array
variable
is used for matching, the internal regular expression form for the
entire
array is derived. The derived internal form is kept after the matching
operation
and is discarded only when the array or any array element is changed.
Thus,
arrays are like regular expression constants in that the internal form
is
kept between matching operations, but also like string constants in
that
the internal form can be changed when necessary.
Regular Expressions in Patterns
There are, however, some places where strings cannot be used as regular expressions. The most notable of these is stand-alone regular expressions in patterns. Stand-alone regular expressions in patterns are a shorthand for:
$0 ~~ /re/
Thus, complex expressions may be built from stand-alone regular expressions in patterns. For example, the pattern:
/re1/ && /re2/ expands to:
($0 ~~ /re1/) && ($0 ~~ /re2/)
This pattern will match only those records for which both regular expressions re1 and re2 match.
Using the logical, relational, equality and bit-wise operators, two or more regular expressions may be combined in patterns to test records against more than one regular expression. The following pattern:
/re1/ != /re2/
will select only those records matching re1 and NOT matching re2. But records matching re2 and not matching re1 will also be selected.
!/re1/
will select those records not matching the regular expression. To use regular expressions in this manner, the following logical truth table may be used for selecting desired records which match or do not match desired regular expressions:
| r1 | T | T | F | F |
| r2
|
T
|
F
|
T
|
F
|
| == | T | F | F | T |
| != | F | T | T | F |
| <= | T | F | T | T |
| < | F | F | T | F |
| > | F | T | F | F |
| >= | T | T | F | T |
| & | T | F | F | F |
| | | T | T | T | F |
| @ | F | T | T | F |
| && | T | F | F | F |
| || | T | T | T | F |
Thus, if you wanted to select only those records that matched both regular expressions and reject those records that did not match both, the following patterns are the only ones to do so:
/re1/ & /re2/
or
/re1/ && /re2/
To select those records matching only re1 and not re2 or both, the following patterns could be used:
/re1/ > /re2/
or
/re1/ && !/re2/
Note that strings could be used for regular expressions in patterns instead of stand-alone regular expressions. However, the economy of expression of the stand-alone regular expressions would be lost. For example, for the stand-alone regular expression pattern:
/re1/ && !/re2/
The following string expression could be used:
($0 ~~ "re1") && ($0 !~ "re2")
or
($0 ~~ "re1") && !($0 ~~ "re2")
If re1 or re2 contained named expressions, then the values of the variables contained in re1 or re2 could be changed to dynamically alter the lines matched. The matching process for the above string expressions would be much slower than for the correspond expressions with stand-alone regular expressions.
Array variables could also be used for those situations where re1 and/or re2 contained named expressions which change and for which the change must be reflected in the matching process. Also, arrays allow the user better control over when the internal regular expression form is derived. Setting:
are1[1] = re1;
and
are2[1] = re2;
The use of are1 and are2 as matching patterns:
are1 && are2
would be identical to
re1 && re2
as a pattern expression. The advantage of using the arrays is when any named expression in re1 and/or r2 change value. Then simply re-assigning the arrays as:
are1[1] = re1;
and
are2[1] = re2;
would discard the internal regular expression form for both arrays which would then be re-derived when the pattern is next matched against an input record. The use of arrays as pattern matching regular expressions yields truely dynamic regular expressions with the user utility having total control over when the internal form is discarded and re-derived.
Regular expressions and strings may also be used in case statements. However, strings are not equivalent to regular expressions in the case statement.