NAME
  agrep - search a file for a string or regular expression, with approximate
  matching capabilities

SYNOPSIS
  agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename... ]

DESCRIPTION
  agrep searches the input filenames (standard input is the default, but see
  a warning under LIMITATIONS) for records containing strings which either
  exactly or approximately match a pattern. A record is by default a line,
  but it can be defined differently using the -d option (see below).  Nor-
  mally, each record found is copied to the standard output.  Approximate
  matching allows finding records that contain the pattern with several
  errors including substitutions, insertions, and deletions.  For example,
  Massechusets matches Massachusetts with two errors (one substitution and
  one insertion).  Running agrep -2 Massechusets foo outputs all lines in foo
  containing any string with at most 2 errors from Massechusets.

  agrep supports many kinds of queries including arbitrary wild cards, sets
  of patterns, and in general, regular expressions.  See PATTERNS below.  It
  supports most of the options supported by the grep family plus several more
  (but it is not 100% compatible with grep).  For more information on the
  algorithms used by agrep see Wu and Manber, "Fast Text Searching With
  Errors," Technical report #91-11, Department of Computer Science, Univer-
  sity of Arizona, June 1991 (available by anonymous ftp from cs.arizona.edu
  in agrep/agrep.ps.1), and Wu and Manber, "Agrep -- A Fast Approximate Pat-
  tern Searching Tool", To appear in USENIX Conference 1992 January (avail-
  able by anonymous ftp from cs.arizona.edu in agrep/agrep.ps.2).

  As with the rest of the grep family, the characters `$', `^', `*', `[',
  `]', `^', `|', `(', `)', `!', and `\' can cause unexpected results when
  included in the pattern, as these characters are also meaningful to the
  shell.  To avoid these problems, one should always enclose the entire pat-
  tern argument in single quotes, i.e., 'pattern'.  Do not use double quotes
  (").

  When agrep is applied to more than one input file, the name of the file is
  displayed preceding each line which matches the pattern.  The filename is
  not displayed when processing a single file, so if you actually want the
  filename to appear, use /dev/null as a second file in the list.

OPTIONS

  -#   # is a non-negative integer (at most 8) specifying the maximum number
       of errors permitted in finding the approximate matches (defaults to
       zero).  Generally, each insertion, deletion, or substitution counts as
       one error.  It is possible to adjust the relative cost of insertions,
       deletions and substitutions (see -I -D and -S options).

  -c   Display only the count of matching records.

  -f patternfile
       patternfile contains a set of (simple) patterns.  The output is all
       lines that match at least one of the patterns in patternfile.
       Currently, the -f option works only for exact match and for simple
       patterns (any meta symbol is interpreted as a regular character); it
       is compatible only with -c, -h, -i, -l, -s, -v, -w, and -x options.
       see LIMITATIONS for size bounds.

  -h   Do not display filenames.

  -i   Case-insensitive search - e.g., "A" and "a" are considered equivalent.

  -k   No symbol in the pattern is treated as a meta character. For example,
       agrep -k 'a(b|c)*d' foo will find the occurrences of a(b|c)*d in foo
       whereas agrep 'a(b|c)*d' foo will find substrings in foo that match
       the regular expression 'a(b|c)*d'.

  -l   List only the files that contain a match.  This option is useful for
       looking for files containing a certain pattern.  For example, " agrep
       -l 'wonderful'  * " will list the names of those files in current
       directory that contain the word 'wonderful'.

  -n   Each line that is printed is prefixed by its record number in the
       file.

  -p   Find records in the text that contain a supersequence of the pattern.
       For example,
        agrep -p DCS foo will match "Department of Computer Science."

  -s   Work silently, that is, display nothing except error messages.  This
       is useful for checking the error status.

  -t   Output the record starting from the end of delim to (and including)
       the next delim. This is useful for cases where delim should come at
       the end of the record.

  -v   Inverse mode - display only those records that do not contain the pat-
       tern.

  -w   Search for the pattern as a word - i.e., surrounded by non-
       alphanumeric characters.  The non-alphanumeric must surround the
       match;  they cannot be counted as errors.  For example, agrep -w -1
       car will match cars, but not characters.

  -x   The pattern must match the whole line.

  -y   Used with -B option. When -y is on, agrep will always output the best
       matches without giving a prompt.

  -B   Best match mode.  When -B is specified and no exact matches are found,
       agrep will continue to search until the closest matches (i.e., the
       ones with minimum number of errors) are found, at which point the fol-

  -Sk  Set the cost of a substitution to k (k is a positive integer).  This
       option does not currently work with regular expressions.

PATTERNS

  agrep supports a large variety of patterns, including simple strings,
  strings with classes of characters, sets of strings, wild cards, and regu-
  lar expressions.

  Strings
       any sequence of characters, including the special symbols `^' for
       beginning of line and `$' for end of line.  The special characters
       listed above ( `$', `^', `*', `[', `^', `|', `(', `)', `!', and `\' )
       should be preceded by `\' if they are to be matched as regular charac-
       ters.  For example, \^abc\\ corresponds to the string ^abc\, whereas
       ^abc corresponds to the string abc at the beginning of a line.

  Classes of characters
       a list of characters inside [] (in order) corresponds to any character
       from the list.  For example, [a-ho-z] is any character between a and h
       or between o and z.  The symbol `^' inside [] complements the list.
       For example, [^i-n] denote any character in the character set except
       character 'i' to 'n'.  The symbol `^' thus has two meanings, but this
       is consistent with egrep.  The symbol `.' (don't care) stands for any
       symbol (except for the newline symbol).

  Boolean operations
       agrep supports an `and' operation `;' and an `or' operation `,', but
       not a combination of both.  For example, 'fast;network' searches for
       all records containing both words.

  Wild cards
       The symbol '#' is used to denote a wild card.  # matches zero or any
       number of arbitrary characters.  For example, ex#e matches example.
       The symbol # is equivalent to .* in egrep.  In fact, .* will work too,
       because it is a valid regular expression (see below), but unless this
       is part of an actual regular expression, # will work faster.

  Combination of exact and approximate matching
       any pattern inside angle brackets <> must match the text exactly even
       if the match is with errors.  For example, <mathemat>ics matches
       mathematical with one error (replacing the last s with an a), but
       mathe<matics> does not match mathematical no matter how many errors we
       allow.

  Regular expressions
       The syntax of regular expressions in agrep is in general the same as
       that for egrep.  The union operation `|', Kleene closure `*', and
       parentheses () are all supported.  Currently '+' is not supported.
       Regular expressions are currently limited to approximately 30 charac-
       ters (generally excluding meta characters).  Some options (-d, -w, -f,
       -t, -x, -D, -I, -S) do not currently work with regular expressions.

  agrep -5 -p abcdefghij /usr/dict/words
       outputs the list of all words containing at least 5 of the first 10
       letters of the alphabet in order.  (Try it:  any list starting with
       academia and ending with sacrilegious must mean something!)

  agrep -1 'abc[0-9](de|fg)*[x-z]' foo
       outputs the lines containing, within up to one error, the string that
       starts with abc followed by one digit, followed by zero or more
       repetitions of either de or fg, followed by either x, y, or z.

  agrep -d '^From ' 'breakdown;internet' mbox
       outputs all mail messages (the pattern '^From ' separates mail mes-
       sages in a mail file) that contain keywords 'breakdown' and 'inter-
       net'.

  agrep -d '$$' -1 '<word1> <word2>' foo
       finds all paragraphs that contain word1 followed by word2 with one
       error in place of the blank. In particular, if word1 is the last word
       in a line and word2 is the first word in the next line, then the space
       will be substituted by a newline symbol and it will match.  Thus, this
       is a way to overcome separation by a newline.  Note that -d '$$' (or
       another delim which spans more than one line) is necessary, because
       otherwise agrep searches only one line at a time.

  agrep '^agrep' <this manual>
       outputs all the examples of the use of agrep in this man pages.

SEE ALSO
  ed(1), ex(1), grep(1V), sh(1), csh(1).

BUGS/LIMITATIONS
  Any bug reports or comments will be appreciated! Please mail them to
  sw@cs.arizona.edu or udi@cs.arizona.edu

  Regular expressions do not support the '+' operator (match 1 or more
  instances of the preceding token).  These can be searched for by using this
  syntax in the pattern:

          'pattern(pattern))*'

  (search for strings containing one instance of the pattern, followed by 0
  or more instances of the pattern).

  The following can cause an infinite loop: agrep pattern * > output_file.
  If the number of matches is high, they may be deposited in output_file
  before it is completely read leading to more matches of the pattern within
  output_file (the matches are against the whole directory).  It's not clear
  whether this is a "bug" (grep will do the same), but be warned.

  The maximum size of the patternfile is limited to be 250Kb, and the maximum
  number of patterns is limited to be 30,000.


AUTHORS
  Sun Wu and Udi Manber, Department of Computer Science, University of
  Arizona, Tucson, AZ 85721.  {sw|udi}@cs.arizona.edu.