- Regular Expressions
- are a separate type of their own on an equal footing with strings,
integer and floating point numbers. Thus, regular expressions may be assigned
to variables and the variables used wherever regular expressions would be
used. This behavior also changes the Awk accepted behavior of a regular
expression constant always matching the current input record. This behavior
is only retained for a regular expression constant in a pattern. Elsewhere,
the match must be explicitly coded. Thus, the Awk behavior for the
line:
if ( /return/ )
which implicitly performs a match against $0, must be coded under
QTAwk as:
if ( $0 ~~ /return/ )
Under Awk attempting to assign a regular expression to a variable
is not possible. The line:
Are = /return/;
assigns a value of 0 or 1 to the variable 'Are', depending on whether
/return matches $0. Under QTAwk, Are is assigned the regular expression.
Awk does not contain the concept of a regular expression as a separate data
type.
- Expanded Regular Expressions.
- All of the Awk regular expression operators are allowed plus the following:
- complemented character lists using the Awk notation, '[^...]', as well
as the Awk/QTAwk and C logical negation operator, '[!...]'.
- Matched character lists, '[#...]'. These lists are used in pairs.
The position of the character matched in the first list of the pair, determines
the character which must match in the position occupied by the second list
of the pair.
- Look-ahead Operator. r@t regular expression r is matched only
when followed by regular expression t.
- Repetition Operator. r{n1,n2} at least n1 and up to n2 repetitions
of regular expression r.
- Named Expressions. {named_expr} is replaced by the string value
of the corresponding variable.
- Tagged Expressions. Enclosing a portion of a regular expression,
in parenthesis, "()" makes the matching string available for use with the
Tag Operator, '[< >]'.
- Consistent statement termination syntax.
- The QTAwk Utility Creation Tool utilizes the semi-colon, ';',
to terminate all statements. The practice in Awk of using new lines to "sometimes"
terminate statements is no longer allowed.
~ - Expanded Operator Set.
- The Awk set of operators has been changed to make them more consistent
and to more closely match those of C. The Awk match operator, ~,
has been changed to ~~ so that the similarity between the match
operators, ~~ and ~, and the equality operators, '==' and
'!=", is complete. The single tilde symbol, ~, reverts to the C
one's complement operator, an addition to the operator set over Awk. The
introduction of the explicit string concatenation operator, '><'.
The remaining "new" operators to QTAwk are:
Operation
|
Operator
|
| tag |
[< >] |
| one's complement |
~ |
| concatenation |
>< |
| shift left/right |
<< >> |
| matching |
~~ !~ |
| bit-wise AND |
& |
| bit-wise XOR |
@ |
| bit-wise OR |
| |
| sequence |
, |
The carat, ^, remains as the exponentiation operator. The
symbol @ is used for the exclusive OR operator. For string operands,
the shift operators, << and >>, shift the strings
with wrap-around instead of a bit shift as for numeric operands.
- Expanded set of recognized constants:
-
- decimal integers,
- octal integers,
- hexadecimal integers,
- character constants, and
- floating point constants.
These constants are recognized in utilitys, input fields and strings.
- Expanded predefined patterns
- giving more control:
- INITIAL
- similar to BEGIN. Actions executed after opening each input file
and before reading first record.
- FINAL
- similar to END. Actions executed after reading last record of each
input file and before closing file.
- NOMATCH
- actions executed for each input record for which no pattern was matched.
- GROUP
- used to group multiple regular expressions for search optimization.
Can speed search by a factor of six.
- True multidimensional Arrays.
- The use of the comma in index expressions to simulate multiple array
indices is no longer supported. True multiple indices are supported. Indexing
is in the C manner, 'a[i1][i2]'. The use of the SUBSEP built-in variable
of Awk has been redefined.
- Integer array indices as well as string indices.
- Array indices have been expanded to include integers as well as the
string indices of Awk. Indices are not automatically converted to strings
as in Awk. Thus, for true integer indices, the index ordering follows the
numeric sequence with an integer index value of '10' following an integer
value of '2' instead of preceding it.
- Arrays integrated into QTAwk.
- QTAwk integrates arrays with arithmetic operators so that the
operations are carried out on the entire array. QTAwk also integrates
arrays into user-defined functions so that they can be passed to and returned
from such functions in a natural and intuitive manner. Awk does not allow
returning arrays from user-defined functions or allow arithmetic operators
to operate on whole arrays.
In addition, with Version 6.00 for PC/MS-DOS and Version 1.00 FOR
OS/2, arrays have been fully integrated into all aspects of QTAwk
including the match operators, '~~' and '!~', and their implied use in patterns
and the built-in functions, 'sub', 'gsub', and 'match'. The MATCH_INDEX
built-in variable has been added to return the matching array element index
when an array has been used for pattern matching. The string value of the
SUBSEP built-in variable is used as the index separator in MATCH_INDEX for
multidimensional arrays.
Arrays used as regular expressions with the match operators, both
explicit and implied, retain their internal regular expression form between
uses. In addition, the internal regular expression form is assigned when
the array as a whole is assigned to another variable, the internal regular
expression form is also assigned. The internal regular expression form is
discarded only when the array is changed. This gives the user a more balanced
control over dynamic regular expressions between that of true regular expressions,
which retain the internal form until execution is halted, and strings used
as regular expressions, which discard the internal regular expression form
after each use.
- NEW Keywords:
-
- cycle
- similar to 'next' except that may use current record in restarting
outer pattern matching loop.
- deletea
- similar to 'delete' except that ALL array values deleted.
- switch
- case
- default
- similar to C syntax with the allowed 'switch' values and 'case'
labels expanded to include any legal QTAwk expression, evaluated at
run-time. The expressions may evaluate to any value including any numeric
value, string or regular expression.
- local
- new keyword to allow the declaration and use of local variables
within compound statements, including user-defined functions. Its use in
user defined functions instead of the Awk practice of defining excess formal
parameters, leads to easier to read and maintain functions. The C 'practice'
of allowing initialization in the 'local' statement is followed.
- endfile
- similar to 'exit'. Simulates end of current input file only, any
remaining input files are still processed.
- New Arithmetic Functions.
- QTAwk includes 18 built-in arithmetic functions. All of the
functions supported by Awk plus the following:
- acos(x)
- arc-cosine of x
- asin(x)
- arc-sine of x
- cosh(x)
- hyperbolic cosine of x
- fract(x)
- fractional portion of x
- log10(x)
- logarithm base 10
- pi or
- pi()
- pi
- sinh(x)
- hyperbolic sine of x
- New String Functions.
- QTAwk includes 33 built-in string functions. All of the functions
supported by Awk plus the following:
- center(s,w) or
- center(s,w,c)
- center string
- copies(s,n)
- copies of string
- deletec(s,p,n)
- delete characters from a string
- gensub(re,rs,how,target)
- generalized substitution function
- insert(s1,s2,p)
- insert one string into another string
- justify(a,n,w) or
- justify(a,n,w,c)
- justify string
- overlay(s1,s2,p)
- overlay one string on another
- remove(s,c)
- remove characters from a string
- replace(s)
- replace all variables in a string
- srange(c1,c2)
- return string formed of all characters from c1 to c2
- srev(s)
- reverse characters of string
- stran(s) or
- stran(s,st) or
- stran(s,st,sf)
- translate characters
- strim(s) or
- strim(s,c) or
- strim(s,c,d)
- trim leading and/or trailing characters
- strlwr(s)
- translate to lower case
- strupr(s)
- translate to upper case
- New Date and Time functions
-
- _time()
- Local time (seconds since midnight)
- _ftime(format_str,sjdn,time)
- Format date/time
- jdn or
- jdn() or
- jdn(y,m,d) or
- Julian Day Number of date specified
- jdn(fdate)
- Calender date of Julian Day Number specified
- New Miscellaneous Functions.
-
- rotate(a)
- rotate the elements of the array a.
- execute(s) or
- execute(s,se) or
- execute(s,se,rf)
- execute string s
- execute(a) or
- execute(a,se) or
- execute(a,se,rf)
- execute array a
- findfile(var,pattern,attributes)
- find files with specified names and attributes
- pd_sym
- access pre-defined variables
- ud_sym
- access user defined variables
- resetre
- return QTAwk utility to start-up condition for all regular
expressions, including patterns and GROUP patterns. Only the internal regular
expression forms for arrays are not re-initialized. The internal regular
expression forms for arrays are re-initialized whenever the array is changed
in any manner.
- setlocale
- set the locale under which QTAwk is operating
- New I/O Functions.
- I/O function syntax has been made consistent with syntax of other functions.
The redirection operators, '<', '>' and '>>', and pipeline operator,
'|', have been deleted as excessively error prone in expressions. The functional
syntax of the 'getline' function has been made identical to that of the other
built-in functions. The new functions 'fgetline', 'fprint' and 'fprintf'
have been introduced for reading and writing to files other than the current
input file and to replace the redirection operators.
- Single character input/output functions have been added:
- getc()
- return next character from current input file,
- fgetc(F)
- return next character from named file, F
- putc(c)
- output character c to standard output file
- fputc(c,F)
- output character c to file F
- The dropped file re-direction operator, '>>', has been replaced
by the 'append' function:
- append(F) -- Opens the file F for output to the end of the file.
All subsequent output to the file is appended to the end of the file. This
function must be called before the first output to the file to append. Any
output to the file prior to calling this function will open the file and discard
any existing contents, i.e., truncate to zero length.
- Two functions to search files for one or more regular expressions:
- srchrecord( sp ) or
- srchrecord( sp , rs ) or
- srchrecord( sp , rs , var )
- search current input file for next record containing match
to 'sp', using 'rs' as record separator (RS if 'rs' not specified), returning
record found in 'var', $0 if 'var' not specified. Update NR and FNR. Also
reparse $0 if 'var' not specified and update NF.
Returns:
- n ==> Record Present And Read, n == Number Of Characters In Record
plus EOR length plus 1.
- 0 ==> End-Of-File, EOF, Encountered
- -1 ==> Read Error Occurred (Including Failure To Open
File)
- fsrchrecord( fn , sp ) or
- fsrchrecord( fn , sp , rs ) or
- fsrchrecord( fn , sp , rs , var )
- search file 'fn' for next record containing match to 'sp',
using 'rs' as record separator (RS if 'rs' not specified), returning record
found in 'var', $0 if 'var' not specified. Reparse $0 if 'var' not specified
and update NF.
Returns:
- n ==> Record Present And Read, n == Number Of Characters In Record
plus EOR length plus 1.
- 0 ==> End-Of-File, EOF, Encountered
- -1 ==> Read Error Occurred (Including Failure To Open
File)
- The function 'get_FNR(F)' has been introduced. This function returns
the current record number of the input file 'F'. This function is necessary
to obtain the current input record number for input files used with the 'fgetline'
and 'fsrchrecord' functions.
- Expanded capability of formatted Output.
- The limited output formatting available with the Awk 'printf' function
has been expanded by adopting the complete output format specification of
the ANSI C standard.
- 'local' keyword.
- The 'local' keyword has been introduced to allow for variables local
to user-defined functions (and any compound statement). This expansion makes
the Awk practice of defining 'extra' formal parameters no longer necessary.
- Expanded user-defined functions.
- With the 'local' keyword, QTAwk allows the user to define functions
that may accept a variable number of arguments. Functions, such as finding
the minimum/maximum of a variable number of variables, are possible with
one function rather than defining separate functions for each possible combination
of arguments.
- User controlled trace capability.
- A user controlled statement trace capability has been added. This
gives the user a simple to use mechanism to trace utility execution. Rather
than adding 'print' statements, merely re-defining the value of a built-in
variable will give utility execution trace information, including utility
line number.
- Expanded built-in variable list.
- With 57 built-in variables, QTAwk includes all of the built-in
variables of Awk plus the following
- _arg_chk
- used to determine whether to check number of arguments passed to user-defined
functions.
- ARGI
- index value in ARGV of next command line argument. Gives more control
of command line argument processing.
- CONVFMT
- used for converting floating point numbers to strings. OFMT used
only for output floating point numbers.
- CLENGTH
- similar to 'RLENGTH' of Awk. Set whenever a 'case' value evaluates
to a regular expression.
- CSTART
- similar to 'RSTART' of Awk. Set whenever a 'case' value evaluates
to a regular expression.
- CYCLE_COUNT
- count number of outer loop cycles with current input record.
- DEGREES
- if TRUE, trigonometric functions assume degree values, radians if
FALSE.
- ENVIRON
- one dimensional array with elements equal to the environment strings
passed to QTAwk
- ECHO_INPUT
- controls echo of standard input file to standard output file.
- FALSE
- predefined with constant value, 0.
- FIELDFILL
- string value used for filling fixed length fields when fields changed.
- FIELDWIDTHS
- can be assigned a value for fixed width fields, over-riding the use
of FS for splitting current record into fields.
- FILEATTR
- file attributes of current input file.
- FILEDATE
- date as a Julian Day Number, JDN, of current input file.
- FILETIME
- time in seconds since midnight of current input file.
- FILEDATE_CREATE
- creation date as a JDN of current input file.
- FILETIME_CREATE
- creation time in seconds since midnight of current input file.
- FILEDATE_LACCESS
- last access date as a JDN of current input file.
- FILETIME_LACCESS
- last access time in seconds since midnight of current input file.
- FILESIZE
- size in bytes of current input file.
- FILE_SORT
- string value to define sort order of array returned by "findfile"
function.
- FILE_SEARCH
- TRUE/FALSE value to search current input file for record(s) containing
match to regular expression(s) in FILE_SEARCH_PAT. Default value FALSE.
- FILE_SEARCH_PAT
- contains one or more patterns for searching current input file.
- FS
- FS allowed to be an array. If FS is an array, multiple patterns may
be set for field separators.
- Gregorian
- TRUE/FALSE value to distinguish using Gregorian or Julian calendar
in computing Julian Day Number or converting back to calendar date.
- IGNORECASE
- if assigned a true value, QTAwk ignores case is all string
and regular expression match operations.
- LOCALE
- single dimensioned array containing the string values for locale
dependent values.
- LONGEST_EXP
- used to control whether the longest or the first string matching
a regular expression is found.
- MATCH_INDEX
- assigned the string value of the matching array element when an array
used for regular expression match.
- MAX_CYCLE
- maximum number of outer loop cycles permitted with current input
record.
- MLENGTH
- similar to 'RLENGTH' of Awk. Set whenever a stand-alone regular
expression is encountered in evaluating a pattern.
- MSTART
- similar to 'RSTART' of Awk. Set whenever a stand-alone regular expression
is encountered in evaluating a pattern.
- NF
- if value changed, current input record changed to reflect new value.
- NG
- equal to the number of the regular expression in a GROUP matching
a string in the current input record.
- OFMT
- string value used only as format for output of floating point numbers.
- RECLEN
- if assigned a non-zero numeric value, integral value used for length
of fixed length records. RS not used unless RECLEN has a zero numeric value.
- RETAIN_FS
- if TRUE the original characters separating the fields of the current
input record are retained whenever a field is changed, causing the input
record to be re-constructed. If FALSE the output field separator, OFS, is
used to separate fields in the current input record during reconstruction.
The latter practice is the only method available in Awk.
- RS
- RS allowed to be an array. If RS is an array, and RECLEN has a zero
numeric value, multiple patterns may be set for record separators.
- RT
- automatically assigned string value of record terminator for current
input record.
- SUBSEP
- string value used as the array element index separator in MATCH_INDEX.
- SPAN_RECORDS
- TRUE/FALSE, default value FALSE. if TRUE allows matches to FILE_SEARCH_PAT
to span multiple input records and return multiple records in $0. If FALSE,
matches confined to a single record. Also controls matches spanning records
in 'srchrecord' and 'fsrchrecord' functions.
- TRACE
- value used to determine utility tracing.
- TRANS_FROM/TRANS_TO
- strings used by 'stran' function if second and/or third arguments
not specified.
- TRUE
- predefined with constant value, 1
- QTAwk_Path
- initialized from 'QTAWK' environment variable. Sets paths searched
for input files.
- vargc
- used only in used-defined functions defined with a variable number
of arguments. At run-time, set equal to the actual number of variable arguments
passed.
- vargv
- used only in used-defined functions defined with a variable number
of arguments. At run-time, an single dimensioned array with each element
set to the argument actually passed.
- New command line options available:
-
- -ffilename
- multiple utility files may be specified. In addition, the file directive:
#include "filename"
may be used to include other files.
- -vvar=value
- sets 'var' to value before any "BEGIN" actions executed
- -Wd
- delays parsing of input record until any fields or the NF variable
referenced.
- -Wm
- forces QTAwk to allow multiple includes of the same file,
issuing an error message and skipping multiple includes.
- Definition of built-in variable, RS, expanded.
- When value assigned to RS, it is converted to regular expression form.
Strings matching regular expression act as record separator. Similar in
behavior to field separator, FS. If an array, multiple record separator
patterns may be specified.
- FILENAME
- In QTAwk, setting built-in variable, "FILENAME", to another
value will change the current input file. Setting the variable in Awk, has
no effect on current input file.
- NF
- In QTAwk, setting built-in variable, NF to another value will
change the current contents of $0. If the new value is greater than the
current value, the current input line is lengthened with new empty fields
separated by the output field separator strings, OFS. If the new value is
less than the current value, then $0 is shortened by truncating at the end
of the field corresponding to the new NF value.
- The Tag Operator, '[< >]'
- The Tag operator may be used to obtain or to set a particular part
of the string matching the regular expression pattern.
- getline
- The return value of the 'getline' function has been changed when a
valid record has been read. The return value is the length of the record
plus the length of the End-Of-Record plus 1.
- Awk Problems
- Corrected admitted problems with Awk. The problems mentioned
on page 182 of "The Awk Programming Language" have been corrected. Specifically:
- true multidimensional arrays have been implemented,
- the 'getline' syntax has been made to match that of other functions,
- declaring local variables in user-defined functions has been corrected,
- intervening blanks are allowed between the function call name and
the opening parenthesis (in fact, under QTAwk it is permissible to
have no opening parenthesis or argument list for user-defined functions that
have been defined with no formal arguments).