QTAwk


Distributed under the GPL

See below for information on downloading executables and source

QTAwk Manual


Refer to "The AWK Programming Language", by Alfred V. Aho, Brian W. Kernighan and Peter J. Weinerger or the GNU gawk reference material. A tarred, bzip2 version of the QTAwk reference document is available, signature file. The Linux executable is available, signature file. The tarred and bezip2 source is available under the GPL, signature file.  Sample QTAwk utility files are available, signature file. The file also contains some QTGrep pattern files.

Downloading the QTCrypt package/executables will allow you to check the source file hashes included with the source. Also, you will be able to check the signatures. See the Home page for more information on the signatures.

Differences Between QTAwk and Awk:

Dynamically Loaded/Unloaded modules
modules written in C, compiled and linked may be dynamically loaded into QTAwk to add user defined functions which run natively and extend the capabilities of QTAwk.
Regular Expressions
are a separate type of their own on an equal footing with strings, integer and floating point numbers. Thus, regular expressions may be assigned to variables and the variables used wherever regular expressions would be used. This behavior also changes the Awk accepted behavior of a regular expression constant always matching the current input record. This behavior is only retained for a regular expression constant in a pattern. Elsewhere, the match must be explicitly coded. Thus, the Awk behavior for the line:

if ( /return/ )

which implicitly performs a match against $0, must be coded under QTAwk as:

if ( $0 ~~ /return/ )

Under Awk attempting to assign a regular expression to a variable is not possible. The line:

Are = /return/;

assigns a value of 0 or 1 to the variable 'Are', depending on whether /return/ matches $0. Under QTAwk, Are is assigned the regular expression. Awk does not contain the concept of a regular expression as a separate data type.

Expanded Regular Expressions.
All of the Awk regular expression operators are allowed plus the following:
  1. complemented character lists using the Awk notation, '[^...]', as well as the Awk/QTAwk and C logical negation operator, '[!...]'. This is more consistent, since the same operator symbol is used for negation.
  2. Matched character lists, '[#...]'. These lists are used in pairs. The position of the character matched in the first list of the pair, determines the character which must match in the position occupied by the second list of the pair.
  3. Look-ahead Operator.r@t regular expression r is matched only when followed by regular expression t.
  4. Interval Operator. r{n1,n2} at least n1 and up to n2 repetitions of regular expression r. Also called the Repetition Operator.
  5. Named Expressions. {named_expr} is replaced by the string value of the corresponding variable named 'named_expr'. If no such variable exists, the operator is not replaced.
  6. Tagged Expressions. Enclosing a portion of a regular expression in parenthesis, "()", makes the matching string available for use with the Tag Operator,'[< >]'.
Consistent statement termination syntax.
The QTAwk Utility Creation Tool utilizes the semi-colon, ';', to terminate all statements. The practice in Awk of using new lines to "sometimes" terminate statements is no longer allowed.
Expanded Operator Set.
The Awk set of operators has been changed to make them more consistent and to more closely match those of C. The Awk match operator, ~, has been changed to ~~ so that the similarity between the match operators, ~~ and !~, and the equality operators, '==' and '!=", is complete. The single tilde symbol, ~, reverts to the C one's complement operator, an addition to the operator set over Awk. The introduction of the explicit string concatenation operator, '><'. The remaining "new" operators to QTAwk are:
Operation
Operator
tag [< >]
one's complement ~
concatenation ><
shift left/right << >>
matching ~~ !~
bit-wise AND &
bit-wise XOR @
bit-wise OR |
sequence ,

The carat, ^, remains as the exponentiation operator. The symbol @ is used for the exclusive OR operator. For string operands, the shift operators, << and >>, shift the strings with wrap-around instead of a bit shift as for numeric operands. The expression sequence operator has been introduced to match that in C.

Expanded set of recognized constants:
  1. decimal integers,
  2. octal integers,
  3. hexadecimal integers,
  4. character constants, and
  5. floating point constants.

These constants are recognized in utilities, input fields and strings.

Expanded predefined patterns
giving more control:
INITIAL
similar to BEGIN. Actions executed after opening each input file and before reading first record.
FINAL
similar to END. Actions executed after reading last record of each input file and before closing file.
NOMATCH
actions executed for each input record for which no pattern was matched.
GROUP
used to group multiple regular expressions for search optimization. Can speed search by a factor of six.
True multidimensional Arrays.
The use of the comma in index expressions to simulate multiple array indices is no longer supported. True multiple indices are supported. Indexing is in the C manner, 'a[i1][i2]'. The use of the SUBSEP built-in variable of Awk has been redefined.
Integer array indices as well as string indices.
Array indices have been expanded to include integers as well as the string indices of Awk. Indices are not automatically converted to strings as in Awk. Thus, for true integer indices, the index ordering follows the numeric sequence with an integer index value of '10' following an integer value of '2' instead of preceding it.
Arrays integrated into QTAwk.
QTAwk integrates arrays with arithmetic operators so that the operations are carried out on the entire array. QTAwk also integrates arrays into user-defined functions so that they can be passed to and returned from such functions in a natural and intuitive manner. Awk does not allow returning arrays from user-defined functions or allow arithmetic operators to operate on whole arrays.

In addition, for all Linux versions, arrays have been fully integrated into all aspects of QTAwk including the match operators, '~~' and '!~', and their implied use in patterns and the built-in functions, 'sub', 'gsub', and 'match'. The MATCH_INDEX built-in variable has been added to return the matching array element index when an array has been used for pattern matching. The string value of the SUBSEP built-in variable is used as the index separator in MATCH_INDEX for multidimensional arrays.

Arrays used as regular expressions with the match operators, both explicit and implied, retain their internal regular expression form between uses. In addition, the internal regular expression form is assigned when the array as a whole is assigned to another variable, the internal regular expression form is also assigned. The internal regular expression form is discarded only when the array is changed. This gives the user a more balanced control over dynamic regular expressions between that of true regular expressions, which retain the internal form until execution is halted, and strings used as regular expressions, which discard the internal regular expression form after each use.

New Keywords:
cycle similar to 'next' except that may use current input record or next input record in restarting outer pattern matching loop. Current values of CYCLE_COUNT and MAX_CYCLE used to determine which input record to use. If CYCLE_COUNT <= MAX_CYCLE, use current input record else read next input record.
switch
case
default
similar to C syntax with the allowed 'switch' values and 'case' labels expanded to include any legal QTAwk expression, evaluated at run-time. The expressions may evaluate to any value including any numeric value, string or regular expression.
local new keyword to allow the declaration and use of local variables within compound statements, including user-defined functions. Its use in user defined functions instead of the Awk practice of defining excess formal parameters, leads to easier to read and maintain functions. The C 'practice' of allowing initialization in the 'local' statement is followed.
endfile similar to 'exit'. Simulates end of current input file only, any remaining input files are still processed.
New Arithmetic Functions.
QTAwk includes 18 built-in arithmetic functions. All of the functions supported by Awk plus the following:
acos(x) arc-cosine of x
asin(x) arc-sine of x
cosh(x) hyperbolic cosine of x
fract(x) fractional portion of x
log10(x) logarithm base 10
pi or
pi()
pi
sinh(x) hyperbolic sine of x
New String Functions.
QTAwk includes 33 built-in string functions. All of the functions supported by Awk plus the following:
center(s,w) or
center(s,w,c)
center string
copies(s,n) copies of string
deletec(s,p,n) delete characters from a string
gensub(re,rs,how,target) generalized substitution function
insert(s1,s2,p) insert one string into another string
justify(a,n,w) or
justify(a,n,w,c)
justify string
overlay(s1,s2,p) overlay one string on another
remove(s,c) remove characters from a string
replace(s) replace all variables in a string
srange(c1,c2) return string formed of all characters from c1 to c2
srev(s) reverse characters of string
stran(s) or
stran(s,st) or
stran(s,st,sf)
translate characters
strim(s) or
strim(s,c) or
strim(s,c,d)
trim leading and/or trailing characters
strlwr(s) translate to lower case
strupr(s) translate to upper case
New Date and Time functions
_time() Local time (seconds since midnight)
_ftime(format_str,sjdn,time) Format date/time
jdn or
jdn() or
jdn(y,m,d)
Julian Day Number of today or date specified
jdn(fdate) Calender date of Julian Day Number specified
New Miscellaneous Functions.
rotate(a) rotate the elements of the array a.
execute(s) or
execute(s,se) or
execute(s,se,rf)
execute string s
execute(a) or
execute(a,se) or
execute(a,se,rf)
execute array a
findfile(var,pattern,attributes) find files with specified names and attributes
pd_sym access pre-defined variables
ud_sym access user defined variables
resetre return QTAwk utility to start-up condition for all regular expressions, including patterns and GROUP patterns. Only the internal regular expression forms for arrays are not re-initialized. The internal regular expression forms for arrays are re-initialized whenever the array is changed in any manner.
setlocale set the locale under which QTAwk is operating
New I/O Functions.
I/O function syntax has been made consistent with syntax of other functions. The redirection operators, '<', '>' and '>>', and pipeline operator, '|', have been deleted as excessively error prone in expressions because of confusion with the value testing and shifting operators. The pipeline operator has been replaced by the new pipeline operator, '|>'. The functional syntax of the 'getline' function has been made identical to that of the other built-in functions. The new functions 'fgetline', 'fprint' and 'fprintf' have been introduced for reading and writing to files other than the current input file and to replace the redirection operators.
  1. Single character input/output functions have been added:
    getc() return next character from current inputfile
    fgetc(F) return next character from named file, F
    putc(c) output character c to standard output file
    fputc(c,F) output character c to file F
  2. The dropped file re-direction operator, '>>', has been replaced by the 'append' function:
  3. append(F) -- Opens the file F for output to the end of the file. All subsequent output to the file is appended to the end of the file. This function must be called before the first output to the file to append. Any output to the file prior to calling this function will open the file and discard any existing contents, i.e., truncate to zero length.
  4. Two functions to search files for one or more regular expressions:
    srchrecord(sp) or
    srchrecord(sp,rs) or
    srchrecord(sp,rs,var)
    search current input file for next record containing match to 'sp', using 'rs' as record separator (RS if 'rs' not specified), returning record found in 'var', $0 if 'var' not specified. Update NR and FNR. Also reparse $0 if 'var' not specified and update NF.

    Returns:

    1. n ==> Record Present And Read, n == Number Of Characters In Record plus EOR length plus 1.
    2. 0 ==> End-Of-File, EOF, Encountered
    3. -1 ==> Read Error Occurred (Including Failure To Open File)
    fsrchrecord(fn,sp) or
    fsrchrecord(fn,sp,rs) or
    fsrchrecord(fn,sp,rs,var)
    search file 'fn' for next record containing match to 'sp', using 'rs' as record separator (RS if 'rs' not specified), returning record found in 'var', $0 if 'var' not specified. Reparse $0 if 'var' not specified and update NF.

    Returns:

    1. n ==> Record Present And Read, n == Number Of Characters In Record plus EOR length plus 1.
    2. 0 ==> End-Of-File, EOF, Encountered
    3. -1 ==> Read Error Occurred (Including Failure To Open File)
  5. The function 'get_FNR(F)' has been introduced. This function returns the current record number of the input file 'F'. This function is necessary to obtain the current input record number for input files used with the 'fgetline' and 'fsrchrecord' functions.
Expanded capability of formatted Output.
The limited output formatting available with the Awk 'printf' function has been expanded by adopting the complete output format specification of the ANSI C standard.
'local' keyword.
The 'local' keyword has been introduced to allow for variables local to user-defined functions (and any compound statement). This expansion makes the Awk practice of defining 'extra' formal parameters no longer necessary.
Expanded user-defined functions.
With the 'local' keyword, QTAwk allows the user to define functions that may accept a variable number of arguments. Functions, such as finding the minimum/maximum of a variable number of variables, are possible with one function rather than defining separate functions for each possible combination of arguments.
User controlled trace capability.
A user controlled statement trace capability has been added. This gives the user a simple to use mechanism to trace utility execution. Rather than adding 'print' statements, merely re-defining the value of a built-in variable will give utility execution trace information, including utility line number.
Expanded built-in variable list.
With 61 built-in variables, QTAwk includes all of the built-in variables of Awk plus the following:
_arg_chk
used to determine whether to check number of arguments passed to user-defined functions.
ARGI
index value in ARGV of next command line argument. Gives more control of command line argument processing.
CONVFMT
used for converting floating point numbers to strings. OFMT used only for output floating point numbers.
CLENGTH
similar to 'RLENGTH' of Awk. Set whenever a 'case' value evaluates to a regular expression.
CSTART
similar to 'RSTART' of Awk. Set whenever a 'case' value evaluates to a regular expression.
CYCLE_COUNT
count number of outer loop cycles with current input record.
DEGREES
if TRUE, trigonometric functions assume degree values, radians if FALSE.
DELAY_INPUT_PARSE
If TRUE parsing of input record into fields is delayed until the value of NF or one of the input fields is needed in an expression. Useful when the values of NF or any input field are only rarely used. Record parsing is done only when needed.
ENVIRON
one dimensional array with elements equal to the environment strings passed to QTAwk
ECHO_INPUT
controls echo of standard input file to standard output file.
FALSE
predefined with constant value, 0.
FIELDFILL
string value used for filling fixed length fields when fields changed.
FIELDWIDTHS
can be assigned a value for fixed width fields, over-riding the use of FS for splitting current record into fields. Similar to the same variable in gawk.
FILEATTR
file attributes of current input file.
FILEDATE
date as a Julian Day Number, JDN, of current input file.
FILETIME
time in seconds since midnight of current input file.
FILEDATE_CREATE
creation date as a JDN of current input file.
FILETIME_CREATE
creation time in seconds since midnight of current input file.
FILEDATE_LACCESS
last access date as a JDN of current input file.
FILETIME_LACCESS
last access time in seconds since midnight of current input file.
FILESIZE
size in bytes of current input file.
FILE_SORT
string value to define sort order of array returned by "findfile" function.
FILE_SEARCH
TRUE/FALSE value to search current input file for record(s) containing match to regular expression(s) in FILE_SEARCH_PAT. Default value FALSE.
FILE_SEARCH_PAT
contains one or more patterns for searching current input file. Useful when next record wanted matches know regular expression(s) and may not be next input record. Speeds reading of file in such cases.
FS
FS allowed to be an array. If FS is an array, multiple patterns may be set for field separators.
Gregorian
TRUE/FALSE value to distinguish using Gregorian or Julian calendar in computing Julian Day Number or converting back to calendar date.
IGNORECASE
if assigned a true value, QTAwk ignores case in all string and regular expression match operations.
LOCALE
single dimensioned array containing the string values for locale dependent values.
LONGEST_EXP
used to control whether the longest or the first string matching a regular expression is found.
MATCH_INDEX
assigned the string value of the matching array element when an array used for regular expression match.
MAX_CYCLE
maximum number of outer loop cycles permitted with current input record.
MLENGTH
similar to 'RLENGTH' of Awk. Set whenever a stand-alone regular expression is encountered in evaluating a pattern.
MSTART
similar to 'RSTART' of Awk. Set whenever a stand-alone regular expression is encountered in evaluating a pattern.
NF
if value changed, current input record changed to reflect new value.
NG
equal to the number of the regular expression in a GROUP matching a string in the current input record.
OFMT
string value used only as format for output of floating point numbers.
RECLEN
if assigned a non-zero numeric value, integral value used for length of fixed length records. RS not used unless RECLEN has a zero numeric value.
RETAIN_FS
if TRUE the original characters separating the fields of the current input record are retained whenever a field is changed, causing the input record to be re-constructed. If FALSE the output field separator, OFS, is used to separate fields in the current input record during reconstruction. The latter practice is the only method available in Awk.
RS
RS allowed to be an array. If RS is an array, and RECLEN has a zero numeric value, multiple patterns may be set for record separators.
RT
automatically assigned string value of record terminator for current input record.
SUBSEP
string value used as the array element index separator in MATCH_INDEX.
SPAN_RECORDS
TRUE/FALSE, default value FALSE. if TRUE allows matches to FILE_SEARCH_PAT to span multiple input records and return multiple records in $0. If FALSE, matches confined to a single record. Also controls matches spanning records in 'srchrecord' and 'fsrchrecord' functions.
TRACE
value used to determine utility tracing.
TRANS_FROM/TRANS_TO
strings used by 'stran' function if second and/or third arguments not specified.
TRUE
predefined with constant value, 1
QTAwk_Path
initialized from resource configuration file(s). Sets paths searched for input files.
vargc
used only in used-defined functions defined with a variable number of arguments. At run-time, set equal to the actual number of variable arguments passed.
vargv
used only in used-defined functions defined with a variable number of arguments. At run-time, an single dimensioned array with each element set to the argument actually passed.
Module_Path
initialized from resource configuration file(s). Sets paths searched for loadable module files.
USER_FUNCTIONS
singly dimensioned array with indices equal to names of user defined functions and element values equal to the names of the files in which function were defined.
MODULES
singly dimensioned array with indices equal to the file bnames of currently loaded modules and element values equal to the module count.
New command line options available:
-ffilename
multiple utility files may be specified. In addition, the file directive:

#include "filename"

or

#include <filename>

may be used to include other files. The path for finding files follows the C pract

-vvar=value
sets 'var' to value before any "BEGIN" actions executed

-Wd
delays parsing of input record until any fields or the NF variable referenced.

-Wm
forces QTAwk to allow multiple includes of the same file, issuing an error message and skipping multiple includes. Without this option specified, QTAwk exits with an error message upon finding multiple includes of the same file.

Definition of built-in variable, RS, expanded.
When value assigned to RS, it is converted to regular expression form. Strings matching regular expression act as record separator. Similar in behavior to field separator, FS. If an array, multiple record separator patterns may be specified.

FILENAME
In QTAwk, setting built-in variable, "FILENAME", to another value will change the current input file. Setting the variable in Awk, has no effect on current input file.

NF
In QTAwk, setting built-in variable, NF to another value will change the current contents of $0. If the new value is greater than the current value, the current input line is lengthened with new empty fields separated by the output field separator strings, OFS. If the new value is less than the current value, then $0 is shortened by truncating at the end of the field corresponding to the new NF value.

The Tag Operator, '[< >]'
The Tag operator may be used to obtain or to set a particular part of the string matching the regular expression pattern.

getline
The return value of the 'getline' function has been changed when a valid record has been read. The return value is the length of the record plus the length of the End-Of-Record plus 1.
Loadable modules
         The ability to dynamically load and unload modules defining user defined functions.

Awk Problems
Corrected admitted problems with Awk. The problems mentioned on page 182 of "The Awk Programming Language" have been corrected. Specifically:
  1. true multidimensional arrays have been implemented,
  2. the 'getline' syntax has been made to match that of other functions,
  3. declaring local variables in user-defined functions has been corrected,
  4. intervening blanks are allowed between the function call name and the opening parenthesis (in fact, under QTAwk it is permissible to have no opening parenthesis or argument list for user-defined functions that have been defined with no formal arguments).

Resource Configuration File(s)
With Linux version 1.60, QTAwk uses up to two resource configuration files. Both are named ".qtawkrc". One is global and used by all users and is located in "/usr/local/etc". It must be placed there by the sys admin, root user or super user (whatever name you use). The second or "local" resource configuration file is located in the users Home directory, which is named in the "HOME" environment variable by the Bash shell.  

QTAwk first tries to open the global resource configuration file. If it exists, it is opened and executed. QTAwk then tries to open the user's local resource configuration file. If it exists in the user's Home directory, it is opened and executed.

The use of resource configuration file(s) by QTAwk offers more ways for the user to customize QTAwk for the user's own use. The QTAWK environment variable was limited to setting the search path for QTAwk utility files. The resource configuration files are now used for that plus much more.

The following commands are available for use in resource configuration files:

  1. pattern_action:
    This command allows the user to insert a pattern/action pair into the resource configuration file. Any pattern/action pair may be inserted including those for pre-defined patterns (BEGIN, INITIAL, NOMATCH, GROUP, FINAL and END) and user defined functions. This allows the use of pre-defined pattern actions and user defined functions for all QTAwk utilities without having to "include" the file into all utility files. The pattern/action pair or user defined function may span as many lines as needed, simply use the backslash, '\', character as the last character on a line to indicate to continue on the next line.
  2. Statement: 
    This command allows any valid "statement" to be inserted into the resource configuration file. The statement will be executed immendiately before any "BEGIN" pre-defined pattern action. The statement may span as many lines as needed, simply use the backslash, '\', character as the last character on a line to indicate continuation on the next line.
  3. Expression:
    This command allows any valid "expression" to be inserted into the resource configuration file. The expression will be executed after any statements defined by the above command and immendiately before any "BEGIN" pre-defined pattern action. The expression may span as many lines as needed, simply use the backslash, '\', character as the last character on a line to indicate to continue on the next line.
  4. Immediate Expression:
    This command allows any valid "expression". The expression is executed immediately. This allows the user to specify the values of Predefined variables. Uually used to set the value of the pre-defined variable "QTAwk_Path" - the search path(s) for utility files.
  5. Delay Input Parse = on/off.
    This command allows the user to turn on or off the Delay Input Parse mode of QTAwk without having to invoke the mode on every command line.
  6. Replace Pattern_Actions
    This command instructs QTAwk to first delete any pattern/action pairs read previously from this configuration file or from a resource configuration file read before the present one. This command would only be useful in the user's local resource configuration file and the user desired to exclude any pattern/action and user defined functions from the global resource configuration file. Normally, the pattern/action pairs in the user's local resource configuration file would simply be used in addition to any from the global file.
  7. Replace Statements
    This command is the same as the previous command, except it works with any statements.
  8. Replace Expressions
    This command is the same as the previous two commands, except it works with any expressions

Note that the resource configuration file is scanned for the keyword commands listed above. Any line not matching the keys is ignored. A sample configuration file, ".qtawkrc", is included and should be customized to the users or system administrator's use.

© Terry D. Boldt 1997-2006
All Right Reserved
Last Updated: Feb. 13, 2006