Expressions specify values to QTAwk or specify the
computations
that a utility performs when it executes. An expression consists of a
sequence
of variable name(s), constant(s), built-in functions, user-defined
functions
and operator(s). The purpose of an expression is to yield a value or
assign
a value to a variable.
Numeric Forms and Arithmetic Operations
QTAwk maintains two separate numeric forms, integral and floating point. The main difference between the two forms involves the type of arithmetic performed for the binary and unary arithmetic operators. The arithmetic operators are:
| Operation
|
Operator
|
| one's complement | ~ |
| increment/decrement | ++ -- |
| unary plus/minus | + - |
| exponentiation | ^ |
| multiply, divide, remainder | * / % |
| binary plus/minus | + - |
| bitwise AND | & |
| bitwise XOR | @ |
| bitwise OR | | |
| assignment | ^= *= /= %= += -= &= @= |= |
For all binary arithmetic operations (except the bitwise operators), the following two simple binary arithmetic rules are followed:
If either or both operand values are strings or regular expressions, the numeric value for either or both string values are obtained before applying the above rules. If a string or regular expression has no numeric form, then an integer zero value is used.
Single character operands are treated as integers, using the ASCII integer value of the single character.
The application of the above rules is most noticeable when one numeric is divided by a second:
numeric_1 / numeric_2
If both numeric_1 and numeric_2 are integers, integer division is performed and the result will be an integer. Thus:
(1 / 2) = 0
If either numeric_1 or numeric_2 is a floating point, floating point division is performed and floating point division the result is a floating point:
1.0 / 2 = 0.5
and
1 / 2.0 = 0.5
and
1.0 / 2.0 = 0.5
Numerics and Strings
QTAwk maintains the separate numeric forms in converting strings to numerics. "123" will be converted to integral form, while "123.0" will be converted to floating point form. The difference in the two forms may become significant in cases where variables are assigned numeric values derived from input string fields or other strings. If numeric operations are performed and full floating point division is desired, 0.0 should be added to a numeric value to assure floating point form.
Thus, QTAwk has three idioms for converting between strings and numeric values: Converting Between
Note that if the string has a floating point form, then a floating point numeric will always result.
Also, the conversion from numeric value to string form will result in a floating point value or integer value depending upon both the numeric value and the value of CONVFMT. In converting a numeric value to string value, QTAwk first considers whether the numeric value is integer or floating point. If an integer value, then a straight integer to ASCII string conversion is performed, retaining the full accuracy of the numeric value. If, however, the numeric value is a floating point value, then the value is formatted using the format string in the built-in variable, CONVFMT. The default value of CONVFMT is "%.6g". Changing the value of CONVFMT to "%6u" will result in integer results for all conversions of a floating point numeric to a string. A value of "%.2f" for CONVFMT will always result in a floating point string form with two decimal places.
The above conversions from numeric value to string value also apply
to
the default conversions made in outputting numeric values with the OFMT
print and fprint functions, except that value of
the built-in variable OFMT is used.
Operators
QTAwk provides a rich set of operators which may be used in expressions. The QTAwk operators are listed below from highest to lowest precedence:
| Operation
|
Operator
|
Associativity
|
| grouping | ( ) | left to right |
| subscripting | [ ] | left to right |
| field | $ | left to right |
| tag operator | [<>] | left to right |
| logical negation (NOT) | ! | left to right |
| one's complement | ~ | left to right |
| pre/post-fix increment/decrement | ++ -- | right to left |
| unary plus/minus | + - | left to right |
| exponentiation | ^ | right to left |
| multiply, divide, remainder | * / % | left to right |
| binary plus/minus | + - | left to right |
| concatenation | adjacency | left to right |
| concatenation | >< | left to right |
| shift left/right | << >> | left to right |
| relational | < <= > >= | left to right |
| equality | == != | left to right |
| matching | ~~ !~ | left to right |
| array membership | in | left to right |
| bitwise AND | & | left to right |
| bitwise XOR | @ | left to right |
| bitwise OR | | | left to right |
| logical AND | && | left to right |
| logical OR | || | left to right |
| conditional | ? : | right to left |
| assignment | = ^= *= /= %= | right to left |
| |
+= -= &= @= |= | |
| |
<<= >>= ><= | |
| sequence | , | left to right |
Note that QTAwk has changed some operators from C and Awk. QTAwk has retained the Awk exponentiation operator (the C bitwise XOR operator) and made @ the QTAwk bitwise XOR operator. QTAwk has changed the Awk match operators to ~~ and !~ to make them consistent with the equality operators, == and !=. This has freed up the single tilde to restore it to its C meaning of unary one's complement. QTawk has added a concatenation operator, ><. This operator makes string concatentation explicit.
QTAwk has also brought forward the remainder of the C operators:
NOTE: The shift operators, << and >>, are bit shift operators for numeric operands and character shift operators for string operands.
Note that all expression operators are left associative except:
Left associativity means that operators of the same precedence are evaluated left to right
Operators of higher precedence are evaluated before operators of
lower precedence. Thus, 10 + 5 * 9 means 10 + (5 * 9) and evaluates
to 55. Since the multiply operator has a higher precedence than the
addition
operator, it is evaluated first.
Arithmetic Operators
The arithmetic operators are:
The assignment operators are used to assign values to variables. Thus:
total = 0;
initializes, assigns, the value zero, 0, to the variable 'total'. Multiple assignments may be accomplished in a single statement;
total_a = total_b = total_c = total_d = 0;
initializes the four variables 'total_a', 'total_b', 'total_c', 'total_d' to 0.
The left operand of the assignment operator, =, must be a variable.
Another form of the assignment operator is available, op=, where op is one of the operators, ^, *, /, %, +, -, &, @, |, ><, <<, >>. The left operand of op= must be a variable. The expression
var op= expression
has the same effect as
var = var op expression
except that the variable 'var' is evaluated only once. Thus,
total = total + 45;
could be written as:
total += 45;
The effect is to increase the value of 'total' by 45.
Unary Ones Complement: ~
The unary ones complement operator, ~ works on both numeric and string values.
For numeric values, i.e., character, integer and floating point values, the result of this operation is the bitwise complement of the operand value. Each 1 bit in the operand results in a 0 bit in the result and each 0 bit in the operand results in a 1 bit in the result. For floating point operands, the equivalent integer value is used for the operand.
For string values (regular expressions are considered strings for
this
purpose), the ones complement operation is performed on each character
of
the string.
Pre/Postfix Increment/Decrement: ++/--
The operand of either the increment or decrement operator, ++ or -- must be a variable, either global or local. The pre/postfix increment and decrement operators may be used as either prefix operators or postfix operators.
The result of the postfix ++ operator is the value of the operand. After the result is obtained, the operand value is incremented. That is, the operand has the value 1 added to it. If the operand is a string, it is converted to a numeric value and the value incremented.
The result of the postfix -- operator is analogous to the postfix ++ operator except that the operand value is decremented, i.e., the value 1 is subtracted from it.
The result of the prefix ++ operator is the incremented value of the operand. The operand has the value 1 added to it, the new value of the operand is the result of the prefix ++ operator. If the operand is a string, it is converted to a numeric value and the value incremented.
The result of the prefix -- operator is analogous to the prefix ++ operator except that the operand value is decremented, i.e., the value 1 is subtracted from it.
These rules can be easily seen in the expressions:
a = b = c++ = 1;
print a,b,c;
output --> 1 1 2
a++ = ++b + c++;
print a,b,c;
output --> 5 2 3
The first statement c to 1 and the value used as the result of the assignment operation. The postfixx operator then increments the value of c to 2. The value of b is then set to the value of the assignment operation on c, 1. This value is then used to set the value of a.
In the second statement, the prefix increment operator increments b
to
2 and the new value is used for the binary '+' operator. The current
value
of c, 2, is used for the binary '+' operator and the value of c is
incremented
to 3 by the postfix increment operator after the value, 2, has been set
aside
for use in the the addition operation. a is set to 2 + 2 = 4. The
postfix
increment operator then increments a to 5.
Unary Plus/Minus: +/-
The value of the unary plus operator is the numeric value of its
operand. The value of the unary minus operator is the negative of the
numeric value
of its operand.
Exponentiation: ^
The result of the binary exponentiation operator, ^, is the value of the left operand raised to the power of the right operand. Both operands are converted to floating point numerics in performing the operation. If both operands are integer values, the result is an integer value. If either or both operands are floating point values, the result is a floating point value.
If either operand has a string value, the numeric value is obtained
before
applying the above rule.
Multiplicative and Additive Operators: * / % + -
The multiplicative operators, multiply, *, divide, /, and remainder, %, and the additive operators, add, +, and subtract, -, follow the standard arithmetic rules for their result. The remainder operator, x % y, yields the remainder of x/y, which is x - i*y for some integer i, such that i*y < x < (i+1)*y.
If both operands are integer values, the result is an integer value.
If either or both operands are floating point values, the result is a
floating
point value.
Bitwise Operators: & @ |
For these operators only integer values are used. If the value of either operand is a string or floating point numeric, the corresponding integer value is obtained for use in the operation.
The result of the binary & operator is the bitwise AND of the operand values, i.e., a bit in the result is a 1 if and only if each of the corresponding bits in the operand values are 1s.
The result of the binary @ operator is the bitwise exclusive OR, XOR, of the operand values, i.e., a bit in the result is a 1 if and only if either the corresponding bit of the left operand is 1 or the corresponding bit of the right operand is 1, but not both.
The result of the binary | operator is the bitwise OR of
the
operand values, i.e., a bit in the result is a 1 if and only if at
least
one of the corresponding bits in either operand values is 1.
Grouping Operators: ( )
Parenthesis are used for grouping parts of expressions to ensure a particular evaluation. For example, the expression:
4 + 5 * 9
is evaluated as:
If, instead, the desired order of evaluation was:
Parenthesis are necessary to ensure the alternative order of evaluation:
(4 + 5) * 9
The parenthesis group (4 + 5) to ensure that the lower precedence
additive
operator is evaluated before the multiplicative operator.
Subscripting: [ ]
Array elements are accessed using the subscripting operators, [].
If the variable AA is a singly dimensioned array, the elements would be
accessed as AA[expression]. The index expression can be any valid QTAwk
expression. For multidimensional arrays, multiple subscripting
operators
are used. If BB is a two-dimensional array, then the rows are
subscripted
as BB[row_number_exp] and the individual columns in each row as
BB[row_number_exp][column_number_exp]. Note that the coma is evaluated
as the sequence
operator and does not separate array indices. Refer to the Section Arrays for a discussion of arrays.
Shift Operators: << >>
The shift operators:
perform either a bitwise or character shift of the left operand. If the left operand is a string, a character shift is performed. If the left operand is a numeric, the equivalent integer value is obtained and used for a bitwise shift. For either shift, the right operand specifies the amount of the shift. The equivalent integer value of the right operand is used.
The shift operators operate on string left operands in QTAwk by shifting the string characters. For string shifts, the characters shifted off one end of the string, wrap to the other end.
For numeric operands and the right shift operator, >>, the integer value of the left operand is treated as unsigned, i.e., the sign bit is not extended in the shift, zero bits are shifted in. If either operand is a floating point value, the result is a floating point value.
Thus, these operators yield the results indicated in the following situations:
"Test String" << 2 --> "st StringTe"
"Test String" >> 3 --> "ingTest Str"
55 << 1 --> 110
55 << 2 --> 220
55.0 << 1 --> 110.0
55.0 << 2 --> 220.0
QTAwk has retained the practice of forcing string concatenation by placing two constants, variables or function calls adjacent. QTAwk has introduced the string concatenation operator, ><. The string concatenation operator has the advantage of making concatenation explicit and allowing the string concatenation assignment operator, ><=. Thus, string concatenation operations which previously had to be written as:
new_string = new_string old_string;
may now be written:
new_string ><= old_string;
Thus a loop to build a string of numerics which previously was written as:
for( i = 8 , j = 9 ; i ; i-- ) j = j i;
can be written as:
for( i = 8 , j = 9 ; i ; i-- ) j ><= i;
and will produce a value for j of:
"987654321"
The string concatenation operator will make some constructs work as expected. For example, the statements:
ostr = "prefix";
suff = "suffix";
k = 1;
j = ostr ++k suff;
print "j = ",j;
print "ostr = ",ostr;
will produce the seemly odd output:
j = prefix1suffix
ostr = 1
This results from two factors:
Thus, QTAwk processes the following stream of tokens:
In interpreting the stream, '++' is encountered immediately after 'ostr' and is interpreted as a postfix operator operating on 'ostr' instead of a prefix operator operating on 'k'. Thus, the stream appears to QTAwk as:
j = ostr ++ k suff;
After concatenating the current string value of ostr, "prefix", with the string value of k, "1", ostr is converted to a numeric, yielding a value of zero, 0, which is incremented to one, 1.
This seemingly anomalous situation can be remedied in two ways:
j = ostr (++k) suff;
j = ostr >< ++k suff;
or
j = ostr >< ++k >< suff;
The output produced by this, is what was really desired:
j = prefix2suffix
ostr = prefix
The field operator, $ is a unary operator used for accessing the fields of the current input record. "$expression" evaluates expression and converts the value to an integer value if necessary. The operator yields the field of the current input record corresponding to the integer specified. Field numbering starts at 1 with the first field and increases to the maximum number of fields found for the current input record. The number of fields in the current input record is stored by QTAwk in the built-in variable, NF, as each record is read.
If the value of the operand expression is zero, the operator yields
the
entire input record. If the value of the operand expression is greater
than
the number of fields in the current input record, i.e., expression
> NF, the operator yields a null string.
Tag Operator: [<>]
QTawk has introduced the tag operator, '[<>]'. The tag operator works on variables and is somewhat analogous in appearance to the array operator. It is closer in operation to the field operator, '$', only the tag operator can be applied to all variables, not just the current input record. The field operator yields a variable with a value equal to the the entire current input record, $0, or a sub-string of the current input record, $i, i > 0.
The result of the tag operator is another variable with a string value equal to a tagged string within the referenced variable, i.e., a sub-string of the string value of the referenced variable. The tag operator yields a result at all times, but if no tagged string exists for the variable or if no tagged string exists for the variable at the row and count specified, then a null string results. Tag strings are created for a variable when the variable is used with a match operator, either '~~' or '!~' and when a variable is used in the 'match' function. The use of the match operator may be explicit or implicit. Implicit use of a match operator can occur in two ways:
There is one restriction on the generation of tagged strings and a non-null string resulting from the tag operator operating on a variable, if the built-in function 'e_type' is executed with the variable as an argument, then the return value must be 1, 2, 3 or 4, i.e., the variable must be a regular expression or a string type. If the variable used in the match operation has a numeric value or is a single character type, then the string value is derived, the match operation executed and the string value discarded. Thus, for single character and numeric values, any tagged strings are discarded after the match operation is completed. Unexpected single character values can be obtained from the 'split' built-in function.
The tag operator has the basic form:
Avar[<r>][<c>]
where 'Avar' is a variable. In this form, the result is the tagged string from the string value of the variable Avar at row r, count c. The following forms are recognized:
where 'R' and 'C' are any valid QTawk expressions. The expressions are evaluated and the integer values of the results used.
r = int(R)
c = int(C)
The result then refers to the tagged string at row r, count c. Valid values of r and c are in the range:
1 <= r <= 7
1 <= c <= 31
If r is greater than 7, a value of 7 is used. If c is greater than 31, a value of 31 is used. If both r and c are zero, then the tag operator returns the entire match string from the match operation.
where 'C' is any valid QTAwk expression. The expression is evaluated and the integer value of the resulting value used.
c = int(C)
The result then refers to the tagged string at row 1 and count c. Valid values of c are in the range 1 <= c <= 31. Identical to Avar[<1>][<C>].
This is a special form of the Tag Operator, like the form of the field operator, '$0', 'Avar[<0>]' returns the entire match string from the matching operation.
These rules are consistent with the rules for field selection with the '$' operator.
The tag operator, '[<>]', may also be used in the replacement expression in the built-in sub, gsub, and gensub functions in the same manner, except that it is not used with a variable name. Tagged strings found matching the regular expression of the scan string will be substituted for the tag operator in the replacement string.
Using the tag operator, a simple QTawk utility could be written to read a file containing a list of floating point numbers, one per line, and change the exponents of the numbers. For example to add 5 to the exponents:
/{_d}+\.{_d}+([eEdD][-+]?({_d}{1,3}))?/ {
if ( ($0)[<2>][<1>] ) ($0)[<2>][<1] += 5;
}
{ print; }
The following lines in a file:
1.23
1.23e+45
1.23E+45
342187.3465
342187.3465e-55
342187.3465E-55
will be output by the utility as:
1.23
1.23e+50
1.23E+50
342187.3465
342187.3465e-60
342187.3465E-60
If the exponent sign were included in the second row parenthesis set, as in the following utility:
/{_d}+\.{_d}+([eEdD]([-+]?{_d}{1,3}))?/ {
if ( ($0)[<2>][<1>] ) ($0)[<2>][<1>] += 5;
}
{ print; }
the output is:
1.23
1.23e+50
1.23E+50
342187.3465
342187.3465e-50
342187.3465E-50
All changes to the value of a tag operator variable are automatically reflected in the string value of the associated variable. This is evident above, changing the value of ($0)[<2>][<1>] changes not only the value of the variable "($0)[<2>][<1>]", but also the string value of the variable "$0".
Note the combined use of the field operator, '$', and the tag operator, '[<>]', above. The use of the parenthesis around '$0' is necessary. Since the tag operator has a higher precedence than the field operator, without the parenthesis, the implied parenthesis by operator precedence would be: $(0[<2>][<1>])
which would give an error since the tag operator may not be applied to a constant. In general, when a tagged string in the current input record is desired, the value desired for the field operator must be enclosed in parenthesis, to prevent the above wrong grouping. Thus, in general the tag operator operating on a field in the current input record would be expressed as: ($(field_exp))[<r_exp>][<c_exp>]
A more complicated example of the utility of the Tag Operator, would be its use in converting a file written to be formatted by the "ipf" compiler to "HTML" notation. The "ipf" compiler uses a header 'tag' in the form:
:h2 id=hdr_ID res=3001.Header Title
The same header could be written for use in HTML as:
<A NAME="hdr_ID">The regular expression:
<H2>Header Title</H2>
Hdr = /^:h({_d}) id=([A-Za-z0-9_]+) res={_d}+\.(.+)$/;
will match the "ipf" headers. Using the tag operator, the equivalent HTML formatting could be easily created:
Hdr {
print "<A NAME=\"" >< ($0)[<1>][<2>] >< "\">";
print "<H" >< ($0)[<1>][<1>] >< ">" >< ($0)[<1>][<3>] >< "</H" >< ($0)[<1>][<1>] >< ">";
}
similarly, references to the header tag under "ipf" are:
:link reftype=hd refid=hdr_ID.Hdr reference:elink.
The equivalent HTML format is:
<A HREF="#hdr_ID">Hdr reference</A>
The regular expression:
ipf_refer = /:link reftype=hd refid=([A-Za-z0-9_]+)\.([!:]+):elink\./;
Thus, the "ipf" reference 'tag' could be changed to the equivalent HTML 'tag' as:
while ( $0 ~~ ipf_refer )
($0)[<0>] = "<A HREF=\"#" >< ($0)[<1>][<1>] >< "\">" >< ($0)[<1>][<2>] >< "</A>";
} # endwhile
Thus, using regular expressions and the tag operator, a fairly simple QTAwk utility could be created to convert a document written for the "ipf" compiler to one for use with an HTML browser.
The tag operator may be "chained" to provide access to smaller and smaller sub-strings within the string value of a variable. For example, if the variable Avar is defined as:
Avar = "All changes to the value of a";
and it is matched with the regular expression value of:
Apat = /{_w}+({_a}+{_w}+({_a}+){_w}+({_a}+)){_w}+({_a}+)/;
then the statement:
Avar ~~ Apat
will yield the following sub-strings for Avar:
Avar[<1>][<1>] == Avar[<1>] == "changes to the"
Avar[<1>][<2>] == Avar[<2>] == "value"
Avar[<2>][<1>] == "to"
Avar[<2>][<2>] == "the"
In addition, the statement:
Avar[<1>] ~~ /({_a}+){_w}+({_a}+){_w}+({_a}+)/;
further breaks Avar into the sub-sub-strings:
(Avar[<1>])[<1>] == "changes"
(Avar[<1>])[<2>] == "to"
(Avar[<1>])[<3>] == "the"
and changing the sub-sub-string, (Avar[<1>])[<2>], as:
(Avar[<1>])[<2>] = ",from this point forward, are reflected in";
Immediately changes Avar and Avar[<1>] as:
Avar[<1>] == "changes, from this point forward, are reflected in the";
Avar == "All changes, from this point forward, are reflected in the value of a";
Changing a sub-string containing nested tagged strings, resets the nested tagged strings to null strings. Thus, the nested tagged strings Avar[<2>][<1>] and Avar[<2>][<2>] are set to null strings when Avar[<1>] is assigned above. Thus, printing Avar[<2>][<1>] and Avar[<2>][<2>] after changing Avar[<1>], would yield null strings.
Attempting to assign values to tagged strings, using the tag operator, which do not exist is quietly ignored by QTAwk. Thus, the statement:
Avar[<1>][<4>] = "A string value";
would be quietly ignored by QTAwk since the tagged string at
row
1, column 4 does not exist.
Logical Operators: && ||
The logical operators are:
A logical expression evaluates to 1 if true and 0 if false. The operands of && and || are evaluated from left to right with && having a higher precedence than ||. Evaluation of the binary && and || operators ceases as soon as the value of the operator can be determined.
Thus, for
exp1 && exp2
if 'exp1' evaluates to false, i.e., a zero numeric or null string, then 'exp2' is not evaluated and the value of the expression is 0, or false. Otherwise, 'exp2' is evaluated and the value of the expression is 1 if 'exp2' is true and 0 if 'exp2' is false.
For
exp1 || exp2
if 'exp1' evaluates to true, i.e., a nonzero numeric or non-null string, then 'exp2' is not evaluated and the value of the expression is 1. Otherwise, 'exp2' is evaluated and the value of the expression is 1 if 'exp2' is true and 0 if 'exp2' is false.
For expressions involving multiple && and || operators, the precedence of the operators must be remembered. It is probably best to use parenthesis to group expressions to avoid confusion.
For example, predict the output
for
the following expressions:
a = b = 1;
c = 0;
d = a++ || b++ && c++;
print a,b,c,d; #1
a = b = 1;
c = 0;
d = a++ || (b++ && c++);
print a,b,c,d; #2
a = b = 1;
c = 0;
d = (a++ || b++) && c++;
print a,b,c,d; #3
a = b = 0;
c = 1;
d = a++ || b++ && c++;
print a,b,c,d; #4
a = b = 0;
c = 1;
d = a++ || (b++ && c++);
print a,b,c,d; #5
a = b = 0;
c = 1;
d = (a++ || b++) && c++;
print a,b,c,d; #6
Since && has higher precedence than ||. the expressions:
a++ || b++ && c++
and
a++ || (b++ && c++)
are equivalent.
See output for the correct
output.
Comparison Operators:
< <= >= > == !=
The binary comparison operators compare the value of the left operand expression to the value of the right operand expression. The comparison operators are:
The comparison operators yield a value of 1 or 0.
In comparison expressions with the relational and equality operators, the following table is used to determine the type of comparison to be made. The rows and columns of the table are indexed by the data types of QTAwk. See Variable Value Data Types for a description of the QTAwk data types. The data type for a given expression or variable can be obtained using the built-in function e_type.
| e_type---->
|
\
|
|
0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
| Uninitialized | 0 | |
I | S | S | I | F | I | I | F |
| Reg. Exp. | 1 | |
S | S | S | S | S | S | S | S |
| String Value | 2 | |
S | S | S | S | S | S | S | S |
| Integer as String | 3 | |
I | S | S | I | F | I | I | F |
| Float as String | 4 | |
F | S | S | F | F | F | F | F |
| Character Value | 5 | |
I | S | S | I | F | C | I | F |
| Integral Value | 6 | |
I | S | S | I | F | I | I | F |
| Float Value | 7 | |
F | S | S | F | F | F | F | F |
where:
| S | indicates a String Comparison |
| I | indicates an Integer Numeric Comparison |
| F | indicates a Floating Point Numeric Comparison |
| C | indicates a Character Comparison |
The comparison may entail converting the value of one operand to an approapriate value. If such a conversion is made, a temporary variable is created with the appropriate temporary value. After the comparison is completed, the temporary variable is discarded.
For string comparisons, the ASCII collating sequence will be used. If two strings are identical except for length, the shorter string will be considered to be less than the longer string. The value of the built-in variable IGNORECASE is considered in all string comparisons. If IGNORECASE is false, then all string comparisons are case sensitive, otherwise all comparisons are case insensitive.
A single character comparison is essentially an integer comparison
if
the built-in variable IGNORECASE is false, if the variable is
true,
than the lower case equivalent of both operands is derived before a
comprison
is made.
Regular
Expression Matching Operators: ~~ !~
The binary regular expression matching operators match the value of the left operand expression to the value of the right operand expression. The regular expression matching operators are:
The regular expression matching operators yield a value of 1 or 0.
For the match operators, ~~ and !~, the right
operand
is converted to its regular expression value if it is not already a
regular
expression. Refer to section Regular
Expressions
for a full description of regular expressions and their use in matching
strings. The value of IGNORECASE is considered in evaluating
the match operators. If IGNORECASE is false, then a case
sensitive match is performed,
otherwise a case insensitive match is performed.
Match Operator Variables
QTAwk has defined two new built-in variables associated with
the
match operators, MLENGTH and MSTART. Whenever the
match
operator is executed MLENGTH is set equal to the length of the matching
string
or zero if no match is found. MSTART is set equal to the position of
the
start of the matching string or zero if no match is found. These
built-in
variables are completely analogous to the built-in variables RLENGTH and RSTART for the built-in match function.
Conditional Operator: ? :
The conditional operator, ? : is a ternary operator, it takes three operands:
expc ? expt : expf
Depending on the value of the conditional expression, expc, only one of the expressions expt, the true value expression, or expf, the false value expression, is evaluated, but not both. The conditional expression, expc, is evaluated first. If the value obtained from expc is true, i.e., a nonzero numeric or non-null string, then the true expression, expt, is evaluated and the value obtained is the value of the operator. If the conditional expression yields a false value, then the false expression, expf is evaluated and the value is the value of the operator. This operator is typically used in places where an:
if ( condition ) statement else statement
would used to set a value, but where it is inconvenient to use the "if else" control flow statement and where the expressions expt and expf are simple expressions and not compound statements.
A simple example often cited is print for output. Depending upon the value of a flag variable, one of two different variables are to be output:
print flag_var ? output_var1 : output_var2;
Logical Negation: !
The logical negation operator, !, yields the negative logical value of the operand:
!expression
The operand expression is evaluated and if a true value, nonzero
numeric
or non-null string is obtained, then a value of false, 0, is the value
of
the operator. If false value is yield by the operand expression, then a
value of true, 1, is the value of the operator.
Array Membership: in
The operator, in. tests if the value of the left operand is an index, at the next level of indexing, of the right operand:
expi in expa
If the value obtained in evaluating the left operand index expression, expi, is an index of the array obtained from evaluating the right array operand expression, expa, then a true value, 1, is the value of the operator, otherwise a false value, 0, is the value. If the value yielded by expa is not an array, 0 is the value of the operator.
The next level of indexing is always tested. Thus, "i in A"
tests if the current value of the variable i is a valid index
of the array variable A. "i in A[j]" tests if the
value
of i is a valid column index in the jth row of the array A.
Sequence Operator: ,
QTAwk uses the C sequence operator, the comma, ,. Using the sequence operator, expressions may be combined into an expr_list:
expression_1 , expression_2 , expression_3 , ...
As in C, a list of expressions separated by the sequence operator is
valid
anywhere an expression is valid. Such lists of expressions separated by
the sequence operator will be referred to as an expression list or
expr_list. Each expression in an expr_list is evaluated in turn. The
final value of
the expr_list is the value of the last expression. The sequence
operator
is very useful in the loop control
statements.
White Space
In tokenizing a utility, white space is used to break keywords, variable and function names and multi-character operators. Otherwise it is ignored. White space is any of the characters:
\t, \n, \v, \f, \r, \s == [\t\n\v\f\r\s] == [\t-\r\s] == {_z}
Thus none of the characters of the multi-character operators can be separated by one of the white space characters. The multi-character operators are:
| Operation
|
Operator
|
| tag | [<>] |
| increment/decrement | ++ -- |
| shift left/right | << >> |
| relational | <= >= |
| equality | == != |
| matching | ~~ !~ |
| array membership | in |
| logical AND | && |
| logical OR | || |
| assignment | ^= *= /= %= |
| |
+= -= &= @= |= |
| |
<<= >>= ><= |
By observing this simple rule for multi-character operators, expressions such as the following will yield the expected results:
a = b = c = d++ = 1
a++ = b++ + ++c;
print a,b,c,d;
output ==> 4 2 2 2
Expressions in QTAwk can contain several types of constants:
Numeric constants have several forms. Integers follow the C practice of allowing decimal, octal and hexadecimal base constants.
Decimal constants match the form: /{_i}/ --> /[-+]?[0-9]+/
Octal constants match the form: /0{_o}+/ --> /0[0-7]+/
Hexadecimal constants match the form: /0[xX]{_h}+/ --> /0[xX][0-9A-Fa-f]+/
The results of all three of the following expressions are equivalent. All set the variable, int_cons, to the integer value, 11567.
int_cons = 11567;
int_cons = 026457;
int_cons = 0x2d2f;
NOTE: octal and hexadecimal integers are recognized not only in QTAwk expressions in utilities, but also in input record fields and other strings.
Floating point numeric constants match the form:
{_f} == /[-+]?({_d}+\.{_d}*|{_d}*\.{_d}+)/
or
{_r} == /{_f}{_e}/ ==
/[-+]?({_d}+\.{_d}*|{_d}*\.{_d}+)[DdEe][-+]?{_d}{1,3}/
Character Constants
Character constants are single characters enclosed in single quotes, ', The same escape sequences allowed in strings and regular expressions are allowed in character constants. All three of the following expressions will set the variable, chr_cons, to 'A':
chr_cons = 'A';
chr_cons = '\x041';
chr_cons = '\101'
QTAwk will maintain variables set to character constants as single characters, but they may be used in arithmetic expressions as any other number. When used in arithmetic expressions, QTAwk will automatically convert them to their ASCII numeric value. Thus:
chr_cons = 'a';
int_cons = chr_cons / 2;
print chr_cons,chr_cons + 0,int_cons;
output: a 97 48
Printing int_cons / 2 yields 48, the integer ASCII value of 'a', == 97, divided by 2. Note that QTAwk maintains 'chr_cons' as a character on output and prints the character value and not the integer value, 97.
The substr function will return a
character
constant when the requested substring is only a single character wide.
String Constants
String constants are character sequences enclosed in double quotes, ".
The same escape sequences allowed in regular expressions are allowed in
string
constants.
Regular Expression
Constants
Regular Expression constants are character sequences enclosed in slashes, /, Regular ExpressionS are discussed fully in the Regular Expression section.