Regular Expression/Keyword File Input

Regular expressions or keywords may be input to QTGrep on the command  line, from ASCII text files or from  internal form files written by QTGrep.  ASCII text  files  and  internal  form  files,  containing  regular  expression or keywords, are specified  to QTGrep with  the 'f' option.  In addition, ASCII text files  may be  input using  the redirection  and piping capabilities of the operating system and the 'p' or 'P' options.

In regular expressions or keywords input via the f, p or P options, leading and trailing blanks and tabs are ignored.  Leading and trailing blanks and tabs must be entered via '\s' or '\t'.  In addition, lines in which the first non-blank character is '#' are treated as comment lines.  Comment lines with 'i', 'm', 'n', 'e', 'o', 'r', 'd', 'f' or 't', (upper or lower case) immediately following the '#' are treated specially.

  #d --> Set Default Match Replacement String
  #e --> Set Named Expressions
  #f --> Set Name Mask For Replacement File Name
  #i --> Include Key Word/Regular Expression Files
  #m --> Specify Comment Appended to Pattern Count
  #n --> Set Expression Numbers and Expression Increment
  #p --> Set Preprocessor Macro Name
  #o --> Output Comment Lines
  #r --> Set Match Replacement String
  #t --> Set Record Terminator(s)


#i include_file_identifier

The include file is opened and read as an expression file.  At the end of the include file, reading of the current file is continued.  The include file must be in the current directory or a directory specified with the "QTGrep" environment variable unless the full drive and path are specified as part of the file identifier.

If the included file name is one of the following names: (upper or lower case)

    standard input
    stdinp
    keyboard
    console

The input is read from the Standard Input File, normally the console keyboard.

Also the file name '-', a single hyphen, will read input from the Standard Input File.  As for the 'p' and 'P' command line options, input is ended by either a blank line or End-Of-File marker.  The End-Of-File marker may be input by Ctrl-Z key combination under OS/2.  As for the 'P' command line option, no prompt is issued by QTGrep.

This operator may be used in conjunction with the output operator, '#o', to custom design prompts and read input from the user.  For example, the following lines may be used in a pattern file to obtain input from the user:

    #o Input Patterns, One Per Line
    #o A Blank Line Terminates Input
    #i keyboard
    #o Patterns Received

Output lines set with the '#o' operator are stored in compiled files created with the 'W' command line option and are output when the file is read. However, if the include operator, '#i', is used in the above manner, to obtain input from the Standard Input File, the operator is not available when the compiled file is read.


  #m User Comment

When the 'C', 'c', 'k', or 's' command line options are specified, a count of matches found in a file or across all files is output.  If more than one search pattern is specified, then a count for each pattern is also output. The pattern number is also output to identify which pattern each count is for.  This command allows the user to append to the end of the count output a comment to identify the pattern.  The command must follow the pattern to be identified and only the first command is recognized, i.e., if more than one command follows any search pattern, all after the first are ignored.

In addition, to keep the comment from overflowing onto another line, only the first 50 characters of the comment are utilized.

See the description of the 'C' command line option for a depiction of how the comment is appended to the count output.


  #n nn ii any character string

The expression number for the next expression is set to nn and the following expression numbers incremented by ii.  Both nn and ii must be greater than zero.  If ii is not specified, it defaults to 1


  #e name expression

In all regular expression(s) or keyword(s), "{name}" is replaced by "expression".  No blanks or tabs are allowed in "name".  "name" may start with an underscore, upper or lower case alphabetic and be followed by any number of underscore, upper or lower case alphabetic characters or digits. NOTE:  The Order of named expressions is not important.  Only that all named expressions be defined for use in regular expression(s) or keyword(s). All lines in all pattern files are read before replacing named expressions. Unrecognized names are not replaced.  In general, names starting with an underscore and followed by a single upper or lower case letter are reserved as predefined.  The following predefined names are currently available:

Alphabetic
{_a} == [[:alpha:]] (== [A-Za-z] in US)
Brackets
{_b} == [{}()[\]<>]
Control Character
{_c} == [[:cntrl:]] (== [\x001-\x01f\x07f] in US)
Digit
{_d} == [[:digit:]] (== [0-9] in US)
Exponent
{_e} == [DdEe][-+]?{_d}{1,3}
Floating Point Number
{_f} == [-+]?({_d}+\.{_d}*|{_d}*\.{_d}+)
Float, Optional Exponent
{_g} == {_f}({_e})?
Hexadecimal Digit
{_h} == [[:xdigit:]] (== [0-9A-Fa-f] in US)
Integer
{_i} == [-+]?{_d}+
Alpha-Numeric
{_n} == [[:alnum:]] (== [A-Za-z0-9] in US)
Octal Digit
{_o} == [[:digit:]] (== [0-7] in US)
Punctuation
{_p} == [[:punct:]] (== [\!-/:-@[-`{-\x07f] in US)
Double or Single Quote
{_q} == {_s}["'`]
Real Number
{_r} == [-+]?({_d}+(\.{_d}*)?|{_d}*\.{_d}+){_e}
Zero or Even Number of Slashes
{_s} == (^|[!\\](\\\\)*)
Printable Character
{_t} == [[:print:]] (== [\s-~] in US)
Graphical Character
{_u} == [[:graph:]] (== [\x01f-~] in US)
White Space
{_w} == [[:blank:]] == [\s\t]
Expanded White Space, \t, \n, \v, \f, \r, \s
{_z} == [[:space:]] (== [\t-\r\s]  in US)
NOTE: for {_r} and {_f}, the decimal point, '.', is replaced by the decimal point from the current locale.

Example:
# define named expressions for use in regular expressions:
# Define C name expression
#e c_n [[:alpha:]_][[:alnum:]_]*
# Define C comment expression
# Note: Does NOT allow comment to span lines
#e c_c (/\*.*\*/)
# Define single line comment
#e c_slc ({_w}*{c_c}{_w}*)*
# Define C name with pointer
#e c_np \**{c_n}
# Define C name with pointer or address
#e c_ni [\*&]*{c_n}
# Define C function type and name declaration
#e c_fname {c_n}({_w}+{c_np})*
# Define expression for first argument in function list
#e c_first_arg ({_w}*{c_ni})
# Define expression for remaining argument in function list
#e c_rem_arg ({_w}*,{c_first_arg})*
# Define C function argument list
#e c_arg_list \(({c_first_arg}{c_rem_arg})*\)
#
# Expression to find all C function definitions
^{c_fname}{c_arg_list}{c_slc}$


 #o Output Comment

Output comment is written to the Standard Error File, normally the console.   The comments are saved to compiled QTgrep files written via the 'W' option   and written to the Standard Error File when the file is used via the 'f'   option. 

Output Comments defined with this operator may be used with the include operator, '#i', to obtain input from the user. Details are available under the '#i' operator.

Output Comments are used by the "prnpat" companion application. The comments are written to the "C" language file within "C" language comment delimiters. This allows the user to create regular expression parsing structures, write these structures to a file, e.g., reg_exp.h. The use of the '#o' operator allows for the final file to be fully commented, descibed and explained.


  #r Set Match Replacement String

If the 'r' command option is activated, QTGrep uses the strings specified with this operator to replace matching strings.  NOTE: If no replacement string is specified for a keyword or regular expression, then the matched string is   replaced with the null string replacement string, when the 'r' command line   option is invoked. The null string is the default replacement string, see   the '#d' operator below.

There are five forms for this pattern file operator:
#r replace-string-pattern
Form 1 is used to specify a particular replacement string for the previous search pattern.
#r #nn Form 2 is used to specify default replacement string number 'nn' for the previous search pattern.  If nn == 0, the null string is used.  If nn is greater than the number of default replacement strings, the last default replacement string is used.

#r #dr
Form 3 is used to specify that the record containing the matching string isto be deleted from the replacement file.  This operator may be used toselectively delete records from the replacement file.  Using this operator mimics the 'v' command line option, but gives the user more control over which records are deleted.
#r #nr
Form 4 is used to specify that the record containing the matching string is to be nulled in the replacement file.  In form 3, the entire record and end-of-record, EOR, string are deleted from the replacement file.  This form deletes the record, but leaves the EOR string, thus nulling the record in the replacement file.
#r
Form 5 is used to specify using the last default replacement string as the replacement string for the previous search pattern.  See the '#d' operator below for specifying default replacement strings.
 
The replacement string for a pattern is specified on a line with the '#r' operator following the pattern to be replaced.  Only one replacement string may be specified for each pattern.  If more than one replacement string is specified after a pattern, only the first is used by QTGrep, all others are ignored.  Thus the following replacement strings are recognized:

  # 1st Replacement Pattern
  First pattern to match
  # The following pattern will replace the above pattern when found
  #r First replacement pattern
  # the following pattern is ignored by QTGrep
  #r Second Replacement Pattern
  # 2cd Replacement Pattern
  Second pattern to match
  # The following pattern will replace the above pattern when found
  #r Second replacement pattern

The replacement string commences with the first non-space character after the '#r' operator and ends with the last non-space character.  Named expressions, {ne}, and repeat operators, {n1,n2}, may be used in replacement strings.  The named expressions and repeat operators are replaced or expanded as appropriate.  Escaped characters may be used and QTGrep will substitute the appropriate character.

In addition, tagged string operators may be included to use all or portions of the matched string in the replacement string.  The tagged string operator is used as follows:
[<0>] 
Replaced By Matched String
[<0,0>]
Replaced By Matched String
[<i,j>]
Replaced By Tagged  String At Level i, Count j
               0 <= i <= 7 , 0 <= j <= 31
[<0,j>]
Replaced By Tagged  String At Level 1, Count j
[<i,0>]
Replaced By Tagged  String At Level i, Count 1
[<j>]
Replaced By Tagged  String At Level 1, Count j

Refer to the information on Tagged Strings, '-?$', for Tagged String levels and counts.

  # The following replacement string contains the matched string within it:
  # '[<0>]' will be replaced by the match string before replacement.
  #r The matched string:  [<0>] 

#d Default Replacement String

This operator sets the default replacement string to be used for replacing the match string when no replacement string has been specified for a pattern with the the '#r' operator.  This operator may be used multiple times in pattern file(s).  Any occurance of this operator specifies the default replacement string for any remaining patterns or until the next occurance of this operator.  If no string is specified on the line with this operator, the default replacement string becomes the null string. NOTE: If no default replacement strings are specified, the null string becomes the only default replacement string.


  #f Replacement File Name Mask

The string specified with this operator becomes the mask used by QTGrep to determine the output file drive, path, name and extension used for the replacement file.  The default mask used if none is specified is:

  {_filename_}.rpl

There are several default named expressions available for the file mask, which are not available for search patterns:
{_filename_} 
replaced by the name of the current input file
{_file_path_}
replaced by the path of the current input file
{_file_ext_}
replaced by the extension of the current input file
{_file_drive_}
replaced by the drive of the current input file
(Note: under OS/2 Only)

In addition, any other named expressions, both pre-defined and user defined, will be replaced in the file name mask.

If the file mask is set to 'stdout':

   #f stdout

the replacement file is written to the standard output file, normally the console screen.  It may be redirected to another disk file or piped to the input to another program.  For example, if the file mask is set to 'stdout' in the expression file 'rpl.exp', the following command:

  qtgrep -rfrpl.exp | more

will replace matched strings and the output will be displayed on the console screen by the 'more' program.  The following replacement string will surround any matched strings with ANSI screen color commands to high-light the matched strings when displayed as above with the 'more' program.

  # use high intensity green on blue to highlight
  #e hi_g \x01b[1;32;44m
  # assume normal screen colors are white on blue
  #e norm \x01b[0;37;44m
  # surround matching string with ANSI commands
  #r {hi}[<0>]{norm}

This behavior mimics the 'D' option, but allows the user to select specific screen colors for each expression.

The following named expression and values may be used for highlighting text:
0
Normal White On Black
1
High Intensity
4
Underscore (Monochrome Display Only)
5
Blink
7
Reverse Video
8
Invisible
30
Black   Foreground
31
Red     Foreground
32
Green   Foreground
33
Yellow  Foreground
34
Blue    Foreground
35
Magenta Foreground
36
Cyan    Foreground
37
White   Foreground
39
Default Foreground
40
Black   Background
41
Red     Background
42
Green   Background
43
Yellow  Background
44
Blue    Background
45
Magenta Background
46
Cyan    Background
47
White   Background
49
Default Background

  # use high intensity black on blue
  #e hi_r \x01b[1;30;44m

Thus, the following named expressions could be used for setting the screen colors:

  #e hi_r_on_blk        \x01b[1;32;40m
  #e norm_cyan_on_white \x01b[0;36;47m


  #t Record Delimitor

By default QTGrep uses a single newline (Unix Input Mode) or a Carriage Return/Newline pair (PC/MS-DOS Input Mode) as the record terminator keyword. In scanning to match for Regular Expressions, QTGrep will match the End-Of-Record, EOR, operator, '$', at any position within the record terminating string.  The user may set the record terminator regular expressions(s) or keyword(s) with this pattern file operator.  Some useful Regular Expression Record Terminators are:

  One or More Blank Lines as Record Terminator (PC/MS-DOS or Unix Input Mode):
  #t \r?\n({_w}*\r?\n)+

  Horizontal and Vertical White Space as Record Terminator:
  #t {_z}+

In scanning for record terminators, QTGrep always scans for the longest possible match to the terminator regular expression or keyword.

#p Macro_Name

The Macro_Name commences with the first non-space character after the '#p' operator and ends with the last non-space character. The Macro_Name for a pattern is specified on a line with the '#p' operator following the pattern to which the Macro_Name applies.

This operator is used only by the 'prnpat' QTGrep companion application. The 'prnpat' application reads QTGrep pattern  files written with the 'W' option and writes a 'C' language data file.

Double asterics, "**" are replaced with the number of the previous pattern expression. NOTE: ONLY strings of EXACTLY 2 asterics are replaced. Single or 3 or more asterics are not replaced.

For a Macro_name definition of: (assuming the previous pattern expression number is 6)

#p Macro_**_Name  **

Macro_Name is writen to the data file as:

#define Macro_6_Name   6


© Terry D. Boldt 1997-2005
All Rights Reserved
Last Updated: Feb. 03, 2005