User Guide
Chapters
Table of Contents
Expressions
Strings and Regular Expressions

Arrays



Arrays

Arrays in QTAwk are "associative". This means that each array is a collection of pairs: an index, and its corresponding array element value. One advantage of associative arrays is that new pairs can be added at any time. Another consequence of associative arrays is that the indices don't have to be positive integers. Any number, or even a string, can be an index. The use of the Awk associative array is expanded to allow integer indices. In Awk all array indices are strings. In QTAwk both strings and integer array indices are allowed.

Multidimensional Arrays

QTAwk allows multidimensional arrays referenced in the same manner as C. Thus:

A[i][j]

references the jth column of the ith row of the two-dimensional array A.

The use of the comma to delineate multiple array indices is discontinued. The comma is now the sequence operator and will be so treated in array index expressions. Thus, the reference

A[i,j]

will now reference the element of A subscripted by the current value of the variable j. As a consequence of this the Awk built-in variable SUBSEP has been assigned a new meaning.

Integer and String Indices

Array subscripts may be strings or integers. Thus:

A[i]["state"]

would reference the "state" element of the ith row of the two dimensional array, A. QTAwk allows array indices to be either integers or strings. Single character indices, e.g., A['a'], are treated as integer indices with an index value equivalent to the integer ASCII value of the single character. Integer and string indices may be used in the same array. Integer indices are stored before string indices. Integer indices follow the usual numeric ordering and string indices follow the ASCII collating sequence. The ordering will be apparent in use of the in form of the for statement.

If a floating point numeric is used to reference an element in an array, the integer value of the floating point numeric is used.

In utilities that use arrays, you often need a loop that executes once for each element of an array. In other languages, where arrays are contiguous and indices are limited to positive integers, this is easy This technique won't do the job in QTAwk, since any integer or string can be an array index. So QTAwk has a special kind of for statement for scanning an array:

for ( k in A ) statement

This loop executes 'statement' once for each index in A, with the variable k set to that index. k is stepped through the indices of the singly dimensioned array, A, in the order stored. Thus, if A has the following indices: 1, 3, 5, 7, 8, 9, 10, 12, 14, "county", "state", "zip". Then k would be stepped through the indices in that order.

Note that allowing both string and integer indices overcomes the disconcerting order of the "stringized numerical" indices of Awk. Specifically, index 10 does not precede 2 as "10" precedes "2" in Awk. QTAwk still allows the use of numeric strings such as "10", "2", etc., but in most cases where such strings would be used, the user should be aware that integer indices are now available and will prevent the counterintuitive ordering of Awk.

Character constants, or variables assigned character constant values, may also be used for array indices. When used as array indices, character constants are converted to their integer ASCII value and that value used as the index value. Thus:

A['a']

and

A[97]

would access the same array element of A.

QTAwk Arrays in Arithmetic Expressions

When Arrays are used in arithmetic expressions in QTAwk, the entire array is operated on or assigned. For example, if the variable 'B' is a 3x3 array with the following values:

B[1][1] = 11, B[1][2] = 12, B[1][3] = 13
B[2][1] = 21, B[2][2] = 22, B[2][3] = 23
B[3][1] = 31, B[3][2] = 32, B[3][3] = 33

Assigning B to the variable 'A':

A = B

will duplicate the entire array into A.

A[1][1] = 11, A[1][2] = 12, A[1][3] = 13
A[2][1] = 21, A[2][2] = 22, A[2][3] = 23
A[3][1] = 31, A[3][2] = 32, A[3][3] = 33

If A and B are array variables and C is a scalar (non-array) variable, then the following expressions for the assignment operators, op= are legal:

A = B
assign one array to a second. The original elements of array A are deleted and the elements of B duplicated into A.
C = B
assigning an array to a variable currently a scalar. Again the elements of B are duplicated into elements of C which becomes an array.
A = C
assigning a scalar to a variable which is an array. The elements of the array are discarded and the variable becomes a scalar.
A = B[i]...[j]
assigning an array element to a variable which is currently an array. Since the element of an array is a scalar, this case is essentially the same as the immediately previous case, and A becomes a scalar.
A[i]...[j] = B[k]...[l]
since array elements are scalars, this is the usual scalar assignment case.
A op= C
the op= operator is applied to every element of A. Thus, A += 2, would add '2' to every element of A.
A op= B
the op= operator is applied to every element of A for which an element exists in B with identical indices. No elements are created in A to match elements of B with indices different from any element of A. Thus, the sequence of statements:

A = B;

A += B;

would leave every element of A with twice the value of the corresponding element of B.

There are three cases of using arrays with the assignment operators that are not legal and for which QTAwk will issue an error message at runtime.

  1. A[i]...[j] = B
  2. A[i]...[j] op= B
  3. C op= B

These are all variations on the same expression. In the first case, the expression is attempting to assign an array to an array element. Since an array element cannot be further expanded into an array, the assignment is not allowed. In the second and third cases, the expressions are attempting to operate on a scalar with an array and assign the result to the scalar. Both of these expressions fail for the same reason. It is possible for a single value, a scalar, to operate on every element of an array, but the reverse, having each element of the array operate on the scalar, is not permitted.

The reasoning prohibiting the second and third cases above is extended to all binary expressions involving arrays in QTAwk. In general, arrays are allowed in expressions with binary arithmetic operators:

~ ^ * / % + - << >> & @ |

as well as string concatenation:

A B (equivalent to A >< B)

In such expressions, arrays are allowed in the following forms:

  1. A op B
  2. A op C

But not as

C op A

It could be argued that expressions such as,

2 + A

should be allowed since '+' is commutative and the expression could be written equivalently as,

A + 2

This is true for addition, but not for all of the binary arithmetic operators. For example, the division operator is not commutative.

2 / A

could not be written equivalently as:

A / 2

For this reason, QTAwk does not allow any array expressions of the form:

scalar op array

The unary arithmetic operators may also be used to operate on entire arrays:
++A (prefix increment operator)
--A (prefix decrement operator)
A++ (postfix increment operator)
A-- (postfix decrement operator)
-A (Unary minus operator)
+A (Unary plus operator)
~A (Unary one's complement operator)

An expression such as:

A + B

will result in an array with element indices identical to those of A, and with values which are the sum of the elements of A and B, which have identical indices. If A has an element for which B does not have a corresponding element, the resultant element value is equal to the A element value. Elements of B which have no corresponding element in A are not represented in the resultant array.

An array with elements of double the value of the elements of B can created as:

A = B;

D = A + B;

or as

D = B + B;

or as

D = B * 2;

any of the above sequence of statements will result in an array, D, with elements with indices identical to B, and with double the element values. The array A could be made an array with elements twice the element values of B with the statement:

A = B;

A *= 2;

Arrays may be used in expressions with arithmetic operators and the whole array will be utilized in the expression. This does not extend to all of the logical operators: Logical Operators

! < <= >= > == != && ||

Using an array with a logical operator listed above will result in the first element in the array only being used in the expression.

Arrays as Regular Expressions & Dynamic Regular Expressions

Arrays can be used with the match operators, ~~ and !~, both their explicit use in expressions and their implicit use in patterns, as case statement labels and the built-in functions, match, sub, gsub, and gensub.

The use of arrays as regular expressions is similar to the use of the predefined pattern, GROUP. Each element of the array is treated as a separate regular expression and the matching process matches against all elements of the array. If a match is found with one of the elements of the array, the built-in variable, MATCH_INDEX, is set to the string value of the array element index. If the array is multidimensional, the indices are separated by the string value of the built-in variable, SUBSEP. The default value of SUBSEP is a single comma, ','.

When an array is used for searching for multiple regular expressions, the internal form of the combined patterns is generated. The internal form is not discarded once the match operation has been completed. The internal form is only discarded whenever any element of the array is changed, the array is discarded via the deletea. statement, or the variable is assigned a new value.

The array can be assigned to another variable after a match operation and the internal form is also assigned to the new variable. This could preserve the internal form if the old variable is assigned a new value.

The use of arrays for searching for matches to one or more regular expressions, can be utilized to produce dynamic regular expressions which can change in time as when strings are used for regular expressions, but which retain the speed of static regular expressions and their static internal form. For example, setting a single array element as:

tp[1] = /Test: {var1}/;

and matching against the array:

if ( match_var ~~ tp ) statement

As long as the element of tp is not changed, the internal regular expression form is used for matching. Since the internal form is not re-derived for each match, the matching processes is speeded up. If the value of 'var1' changes or a new test pattern is desired, then tp[1] can be re-assigned:

tp[1] = /Test: {var1}/;

The assignment discards the internal regular expression form for the tp array which will be re-derived on the next match operation.

Arrays can be used in this manner not only in the action portion of pattern-action pairs, but also in the pattern portion. Thus a pattern such as:

tp {
.
.
.
}

can be used. The first match against tp will set the internal regular expression form for the array. The internal form will not change until an element of tp is changed, then the internal form will be re-derived on the next match against the array.

Note that both individual elements of an array and the array as a whole can be used for matching. Thus, matching against tp[1]:

if ( mvar ~~ tp[1] ) statement

will match against the regular expression:

/Test: {var1}/

The first such match will set the internal form of the regular expression using the value of the variable 'var1' at that time. This internal form does not change. Re-assigning tp[1]:

tp[1] = /Test: {var1}/;

does not alter the regular expression internal form even though the internal regular expression form for the array tp has been discarded by the assignment. Matching against tp as an array will use the new value for 'var1' at the time of the new match operation.

Arrays can be used in GROUP patterns. Any match against an element of the array will activate the action associated with the array position in the GROUP and set the built-in variable, NG, to the integer value for the position of the array in the GROUP. Using arrays in GROUP patterns does not set the MATCH_INDEX variable. Also the internal regular expression form for the GROUP patterns is not re-derived when the array changes.

The sub, gsub, and gensub functions explain the use of arrays as the matching regular expression. The use of an array as the matching regular expression in these functions as opposed to a loop for multiple regular expressions is also described.


TOP
User Guide
Chapters
Table of Contents
Expressions
Strings and Regular Expressions