packages icon



 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



 NAME
      glimpse 4.1 - search quickly through entire file systems

 OVERVIEW
      Glimpse (which stands for GLobal IMPlicit SEarch) is a very popular
      UNIX indexing and query system that allows you to search through a
      large set of files very quickly.  Glimpse supports most of agrep's
      options (agrep is our powerful version of grep) including approximate
      matching (e.g., finding misspelled words), Boolean queries, and even
      some limited forms of regular expressions.  It is used in the same
      way, except that you don't have to specify file names.  So, if you are
      looking for a needle anywhere in your file system, all you have to do
      is say glimpse needle and all lines containing needle will appear
      preceded by the file name.  To use glimpse you first need to index
      your files with glimpseindex.  For example, glimpseindex -o ~  will
      index everything at or below your home directory.  See man
      glimpseindex for more details.  Glimpse is also available for web
      sites, as a set of tools called WebGlimpse.  (The old glimpseHTTP is
      no longer supported and is not recommended.) See
      http://webglimpse.net/ for more information.  Glimpse includes all of
      agrep and can be used instead of agrep by giving a file name(s) at the
      end of the command.  This will cause glimpse to ignore the index and
      run agrep as usual.  For example, glimpse -1 pattern file is the same
      as agrep -1 pattern file.  Agrep is distributed as a self-contained
      package within glimpse, and can be used separately.  We added a new
      option to agrep:  -r searches recursively the directory and everything
      below it (see agrep options below); it is used only when glimpse
      reverts to agrep.  Mail majordomo@webglimpse.net with SUBSCRIBE
      wgusers in the body to be added to the Webglimpse users mailing list.
      This is now the location where glimpse questions are also discussed.
      Bugs can be reported at http://webglimpse.net/bugzilla/ HTML version
      of these manual pages can be found in
      http://webglimpse.net/docs/glimpsehelp.html Also, see the glimpse home
      pages in http://webglimpse.net/glimpse

 SYNOPSIS
      glimpse - [almost all letters] pattern

 INTRODUCTION
      We start with simple ways to use glimpse and describe all the options
      in detail later on.  Once an index is built, using glimpseindex,
      searching for pattern is as easy as saying glimpse pattern The output
      of glimpse is similar to that of agrep (or any other grep).  The
      pattern can be any agrep legal pattern including a regular expression
      or a Boolean query (e.g., searching for Tucson AND Arizona is done by
      glimpse 'Tucson;Arizona').  The speed of glimpse depends mainly on the
      number and sizes of the files that contain a match and only to a
      second degree on the total size of all indexed files.  If the pattern
      is reasonably uncommon, then all matches will be reported in a few
      seconds even if the indexed files total 500MB or more.  Some
      information on how glimpse works and a reference to a detailed article



                                    - 1 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      are given below.  Most of agrep (and other grep's) options are
      supported, including approximate matching.  For example, glimpse -1
      'Tuson;Arezona' will output all lines containing both patterns
      allowing one spelling error in any of the patterns (either insertion,
      deletion, or substitution), which in this case is definitely needed.
      glimpse -w -i 'parent' specifies case insensitive (-i) and match on
      complete words (-w).  So 'Parent' and 'PARENT' will match,
      'parent/child' will match, but 'parenthesis' or 'parents' will not
      match.  (Starting at version 3.0, glimpse can be much faster when
      these two options are specified, especially for very large indexes.
      You may want to set an alias especially for "glimpse -w -i".) The -F
      option provides a pattern that must match the file name.  For example,
      glimpse -F '\.c$' needle will find the pattern needle in all files
      whose name ends with .c.  (Glimpse will first check its index to
      determine which files may contain the pattern and then run agrep on
      the file names to further limit the search.) The -F option should not
      be put at the end after the main pattern (e.g., "glimpse needle -F
      hay" is incorrect).

 A Detailed Description of All the Options of Glimpse
      -#   # is an integer between 1 and 8 specifying the maximum number of
           errors permitted in finding the approximate matches (the default
           is zero).  Generally, each insertion, deletion, or substitution
           counts as one error.  It is possible to adjust the relative cost
           of insertions, deletions and substitutions (see -I -D and -S
           options).  Since the index stores only lower case characters,
           errors of substituting upper case with lower case may be missed
           (see LIMITATIONS).  Allowing errors in the match requires more
           time and can slow down the match by a factor of 2-4.  Be very
           careful when specifying more than one error, as the number of
           matches tend to grow very quickly.

      -a   prints attribute names.  This option applies only to Harvest SOIF
           structured data (used with glimpseindex -s).  (See
           http://harvest.sourceforge.net/ for more information about the
           Harvest project.)

      -A   used for glimpse internals.

      -b   prints the byte offset (from the beginning of the file) of the
           end of each match.  The first character in a file has offset 0.

      -B   Best match mode.  (Warning: -B sometimes misses matches.  It is
           safer to specify the number of errors explicitly.) When -B is
           specified and no exact matches are found, glimpse will continue
           to search until the closest matches (i.e., the ones with minimum
           number of errors) are found, at which point the following message
           will be shown: "the best match contains x errors, there are y
           matches, output them? (y/n)" This message refers to the number of
           matches found in the index.  There may be many more matches in
           the actual text (or there may be none if -F is used to filter



                                    - 2 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           files).  When the -#, -c, or -l options are specified, the -B
           option is ignored.  In general, -B may be slower than -#, but not
           by very much.  Since the index stores only lower case characters,
           errors of substituting upper case with lower case may be missed
           (see LIMITATIONS).

      -c   Display only the count of matching records.  Only files with
           count > 0 are displayed.

      -C   tells glimpse to send its queries to glimpseserver.

      -d 'delim'
           Define delim to be the separator between two records.  The
           default value is '$', namely a record is by default a line.
           delim can be a string of size at most 8 (with possible use of ^
           and $), but not a regular expression.  Text between two delim's,
           before the first delim, and after the last delim is considered as
           one record.  For example, -d '$$' defines paragraphs as records
           and -d '^From ' defines mail messages as records.  glimpse
           matches each record separately.  This option does not currently
           work with regular expressions.  The -d option is especially
           useful for Boolean AND queries, because the patterns need not
           appear in the same line but in the same record.  For example,
           glimpse -F mail -d '^From ' 'glimpse;arizona;announcement' will
           output all mail messages (in their entirety) that have the 3
           patterns anywhere in the message (or the header), assuming that
           files with 'mail' in their name contain mail messages.  If you
           want the scope of the record to be the whole file, use the -W
           option.  Glimpse warning: Use this option with care.  If the
           delimiter is set to match mail messages, for example, and glimpse
           finds the pattern in a regular file, it may not find the
           delimiter and will therefore output the whole file.  (The -t
           option - see below - can be used to put the delim at the end of
           the record.) Performance Note: Agrep (and glimpse) resorts to
           more complex search when the -d option is used.  The search is
           slower and unfortunately no more than 32 characters can be used
           in the pattern.

      -Dk  Set the cost of a deletion to k (k is a positive integer).  This
           option does not currently work with regular expressions.

      -e pattern
           Same as a simple pattern argument, but useful when the pattern
           begins with a `-'.

      -E   prints the lines in the index (as they appear in the index) which
           match the pattern.  Used mostly for debugging and maintenance of
           the index.  This is not an option that a user needs to know
           about.





                                    - 3 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      -f file_name
           this option has a different meaning for agrep than for glimpse:
           In glimpse, only the files whose names are listed in file_name
           are matched.  (The file names have to appear as in
           .glimpse_filenames.) In agrep, the file_name contains the list of
           the patterns that are searched.  (Starting at version 3.6, this
           option for glimpse is much faster for large files.)

      -F file_pattern
           limits the search to those files whose name (including the whole
           path) matches file_pattern.  This option can be used in a variety
           of applications to provide limited search even for one large
           index.  If file_pattern matches a directory, then all files with
           this directory on their path will be considered.  To limit the
           search to actual file names, use $ at the end of the pattern.
           file_pattern can be a regular expression and even a Boolean
           pattern.  This option is implemented by running agrep
           file_pattern on the list of file names obtained from the index.
           Therefore, searching the index itself takes the same amount of
           time, but limiting the second phase of the search to only a few
           files can speed up the search significantly.  For example,

           glimpse -F 'src#\.c$' needle

           will search for needle in all .c files with src somewhere along
           the path.  The -F file_pattern must appear before the search
           pattern (e.g., glimpse needle -F '\.c$' will not work).  It is
           possible to use some of agrep's options when matching file names.
           In this case all options as well as the file_pattern should be in
           quotes.  (-B and -v do not work very well as part of a
           file_pattern.) For example,

           glimpse -F '-1 \.html' pattern

           will allow one spelling error when matching .html to the file
           names (so ".htm" and ".shtml" will match as well).

           glimpse -F '-v \.c$' counter

           will search for 'counter' in all files except for .c files.

      -g   prints the file number (its position in the .glimpse_filenames
           file) rather than its name.

      -G   Output the (whole) files that contain a match.

      -h   Do not display filenames.

      -H directory_name
           searches for the index and the other .glimpse files in
           directory_name.  The default is the home directory.  This option



                                    - 4 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           is useful, for example, if several different indexes are
           maintained for different archives (e.g., one for mail messages,
           one for source code, one for articles).

      -i   Case-insensitive search - e.g., "A" and "a" are considered
           equivalent.  Glimpse's index stores all patterns in lower case
           (see LIMITATIONS below).  Performance Note: When -i is used
           together with the -w option, the search may become much faster.
           It is recommended to have -i and -w as defaults, for example,
           through an alias.  We use the following alias in our .cshrc file
           alias glwi 'glimpse -w -i'

      -Ik  Set the cost of an insertion to k (k is a positive integer).
           This option does not currently work with regular expressions.

      -j   If the index was constructed with the -t option, then -j will
           output the files last modification dates in addition to
           everything else.  There are no major performance penalties for
           this option.

      -J host_name
           used in conjunction with glimpseserver (-C) to connect to one
           particular server.

      -k   No symbol in the pattern is treated as a meta character.  For
           example, glimpse -k 'a(b|c)*d' will find the occurrences of
           a(b|c)*d whereas glimpse 'a(b|c)*d' will find substrings that
           match the regular expression 'a(b|c)*d'.  (The only exception is
           ^ at the beginning of the pattern and $ at the end of the
           pattern, which are still interpreted in the usual way.  Use \^ or
           \$ if you need them verbatim.)

      -K port_number
           used in conjunction with glimpseserver (-C) to connect to one
           particular server at the specified TCP port number.

      -l   Output only the files names that contain a match.  This option
           differs from the -N option in that the files themselves are
           searched, but the matching lines are not shown.

      -L x | x:y | x:y:z
           if one number is given, it is a limit on the total number of
           matches.  Glimpse outputs only the first x matches.  If -l is
           used (i.e., only file names are sought), then the limit is on the
           number of files; otherwise, the limit is on the number of
           records.  If two numbers are given (x:y), then y is an added
           limit on the total number of files.  If three numbers are given
           (x:y:z), then z is an added limit on the number of matches per
           file.  If any of the x, y, or z is set to 0, it means to ignore
           it (in other words 0 = infinity in this case);  for example, -L
           0:10 will output all matches to the first 10 files that contain a



                                    - 5 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           match.  This option is particularly useful for servers that needs
           to limit the amount of output provided to clients.

      -m   used for glimpse internals.

      -M   used for glimpse internals.

      -n   Each matching record (line) is prefixed by its record (line)
           number in the file.  Performance Note: To compute the record/line
           number, agrep needs to search for all record delimiters (or line
           breaks), which can slow down the search.

      -N   searches only the index (so the search is faster).  If -o or -b
           are used then the result is the number of files that have a
           potential match plus a prompt to ask if you want to see the file
           names.  (If -y is used, then there is no prompt and the names of
           the files will be shown.) This could be a way to get the matching
           file names without even having access to the files themselves.
           However, because only the index is searched, some potential
           matches may not be real matches.  In other words, with -N you
           will not miss any file but you may get extra files.  For example,
           since the index stores everything in lower case, a case-sensitive
           query may match a file that has only a case-insensitive match.
           Boolean queries may match a file that has all the keywords but
           not in the same line (indexing with -b allows glimpse to figure
           out whether the keywords are close, but it cannot figure out from
           the index whether they are exactly on the same line or in the
           same record without looking at the file).  If the index was not
           build with -o or -b, then this option outputs the number of
           blocks matching the pattern.  This is useful as an indication of
           how long the search will take.  All files are partitioned into
           usually 200-250 blocks.  The file .glimpse_statistics contains
           the total number of blocks (or glimpse -N a will give a pretty
           good estimate; only blocks with no occurrences of 'a' will be
           missed).

      -o   the opposite of -t: the delimiter is not output at the tail, but
           at the beginning of the matched record.

      -O   the file names are not printed before every matched record;
           instead, each filename is printed just once, and all the matched
           records within it are printed after it.

      -p   (from version 4.0B1 only) Supports reading compressed set of
           filenames.  The -p option allows you to utilize compressed
           `neighborhoods' (sets of filenames) to limit your search, without
           uncompressing them.  Added mostly for WebGlimpse.  The usage is:
           "-p filename:X:Y:Z" where "filename" is the file with compressed
           neighborhoods, X is an offset into that file (usually 0, must be
           a multiple of sizeof(int)), Y is the length glimpse must access
           from that file (if 0, then whole file; must be a multiple of



                                    - 6 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           sizeof(int)), and Z must be 2 (it indicates that "filename" has
           the sparse-set representation of compressed neighborhoods: the
           other values are for internal use only). Note that any colon ":"
           in filename must be escaped using a backslash .

      -P   used for glimpse internals.

      -q   prints the offsets of the beginning and end of each matched
           record.  The difference between -q and -b is that -b prints the
           offsets of the actual matched string, while -q prints the offsets
           of the whole record where the match occurred.  The output format
           is @x{y}, where x is the beginning offset and y is the end
           offset.

      -Q   when used together with -N glimpse not only displays the filename
           where the match occurs, but the exact occurrences (offsets) as
           seen in the index.  This option is relevant only if the index was
           built with -b;  otherwise, the offsets are not available in the
           index.  This option is ignored when used not with -N.

      -r   This option is an agrep option and it will be ignored in glimpse,
           unless glimpse is used with a file name at the end which makes it
           run as agrep.  If the file name is a directory name, the -r
           option will search (recursively) the whole directory and
           everything below it.  (The glimpse index will not be used.)

      -R k defines the maximum size (in bytes) of a record.  The maximum
           value (which is the default) is 48K.  Defining the maximum to be
           lower than the deafult may speed up some searches.

      -s   Work silently, that is, display nothing except error messages.
           This is useful for checking the error status.

      -Sk  Set the cost of a substitution to k (k is a positive integer).
           This option does not currently work with regular expressions.

      -t   Similar to the -d option, except that the delimiter is assumed to
           appear at the end of the record.  Glimpse will output the record
           starting from the end of delim to (and including) the next delim.
           (See warning for the -d option.)

      -T directory
           Use directory as a place where temporary files are built.
           (Glimpse produces some small temporary files usually in /tmp.)
           This option is useful mainly in the context of structured queries
           for the Harvest project, where the temporary files may be non-
           trivial, and the /tmp directory may not have enough space for
           them.

      -U   (starting at version 4.0B1) Interprets an index created with the
           -X or the -U option in glimpseindex.  Useful mostly for



                                    - 7 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           WebGlimpse or similar web applications.  When glimpse outputs
           matches, it will display the filename, the URL, and the title
           automatically.

      -v   (This option is an agrep option and it will be ignored in
           glimpse, unless glimpse is used with a file name at the end which
           makes it run as agrep.) Output all records/lines that do not
           contain a match.  (Glimpse does not support the NOT operator
           yet.)

      -V   prints the current version of glimpse.

      -w   Search for the pattern as a word - i.e., surrounded by non-
           alphanumeric characters.  For example, glimpse -w car will match
           car, but not characters and not car10.  The non-alphanumeric must
           surround the match;  they cannot be counted as errors.  This
           option does not work with regular expressions.  Performance Note:
           When -w is used together with the -i option, the search may
           become much faster.  The -w will not work with $, ^, and _ (see
           BUGS below).  It is recommended to have -i and -w as defaults,
           for example, through an alias.  We use the following alias in our
           .cshrc file
           alias glwi 'glimpse -w -i'

      -W   The default for Boolean AND queries is that they cover one record
           (the default for a record is one line) at a time.  For example,
           glimpse 'good;bad' will output all lines containing both 'good'
           and 'bad'.  The -W option changes the scope of Booleans to be the
           whole file.  Within a file glimpse will output all matches to any
           of the patterns.  So, glimpse -W 'good;bad' will output all lines
           containing 'good' or 'bad', but only in files that contain both
           patterns.  The NOT operator '~' can be used only with -W.  It is
           described later on.  The OR operator is essentially unaffected
           (unless it is in combination with the other Boolean operations).
           For structured queries, the scope is always the whole attribute
           or file.

      -x   The pattern must match the whole line.  (This option is
           translated to -w when the index is searched and it is used only
           when the actual text is searched.  It is of limited use in
           glimpse.)

      -X   (from version 4.0B1 only) Output the names of files that contain
           a match even if these files have been deleted since the index was
           built.  Without this option glimpse will simply ignore these
           files.

      -y   Do not prompt.  Proceed with the match as if the answer to any
           prompt is y.  Servers (or any other scripts) using glimpse will
           probably want to use this option.




                                    - 8 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      -Y k If the index was constructed with the -t option, then -Y x will
           output only matches to files that were created or modified within
           the last x days.  There are no major performance penalties for
           this option.

      -z   Allow customizable filtering, using the file .glimpse_filters to
           perform the programs listed there for each match.  The best
           example is compress/decompress.  If .glimpse_filters include the
           line
           *.Z   uncompress <
           (separated by tabs) then before indexing any file that matches
           the pattern "*.Z" (same syntax as the one for .glimpse_exclude)
           the command listed is executed first (assuming input is from
           stdin, which is why uncompress needs <) and its output (assuming
           it goes to stdout) is indexed.  The file itself is not changed
           (i.e., it stays compressed).  Then if glimpse -z is used, the
           same program is used on these files on the fly.  Any program can
           be used (we run 'exec').  For example, one can filter out parts
           of files that should not be indexed.  Glimpseindex tries to apply
           all filters in .glimpse_filters in the order they are given.  For
           example, if you want to uncompress a file and then extract some
           part of it, put the compression command (the example above) first
           and then another line that specifies the extraction.  Note that
           this can slow down the search because the filters need to be run
           before files are searched.  (See also glimpseindex.)

      -Z   No op.  (It's useful for glimpse's internals. Trust us.) The
           characters `$', `^', `*', `[', `]', `^', `|', `(', `)', `!', and
           `\' can cause unexpected results when included in the pattern, as
           these characters are also meaningful to the shell.  To avoid
           these problems, enclose the entire pattern in single quotes,
           i.e., 'pattern'.  Do not use double quotes (").

 PATTERNS
      glimpse supports a large variety of patterns, including simple
      strings, strings with classes of characters, sets of strings, wild
      cards, and regular expressions (see LIMITATIONS).

      Strings
           Strings are any sequence of characters, including the special
           symbols `^' for beginning of line and `$' for end of line.  The
           following special characters ( `$', `^', `*', `[', `^', `|', `(',
           `)', `!', and `\' ) as well as the following meta characters
           special to glimpse (and agrep): `;', `,', `#', `<', `>', `-', and
           `.', should be preceded by `\' if they are to be matched as
           regular characters.  For example, \^abc\\ corresponds to the
           string ^abc\, whereas ^abc corresponds to the string abc at the
           beginning of a line.

      Classes of characters
           a list of characters inside [] (in order) corresponds to any



                                    - 9 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           character from the list.  For example, [a-ho-z] is any character
           between a and h or between o and z.  The symbol `^' inside []
           complements the list.  For example, [^i-n] denote any character
           in the character set except character 'i' to 'n'.  The symbol `^'
           thus has two meanings, but this is consistent with egrep.  The
           symbol `.' (don't care) stands for any symbol (except for the
           newline symbol).

      Boolean operations
           Glimpse supports an `AND' operation denoted by the symbol `;' an
           `OR' operation denoted by the symbol `,', a limited version of a
           'NOT' operation (starting at version 4.0B1) denoted by the symbol
           `~', or any combination.  For example, glimpse
           'pizza;cheeseburger' will output all lines containing both
           patterns.  glimpse -F 'gnu;\.c$' 'define;DEFAULT' will output all
           lines containing both 'define' and 'DEFAULT' (anywhere in the
           line, not necessarily in order) in files whose name contains
           'gnu' and ends with .c.  glimpse '{political,computer};science'
           will match 'political science' or 'science of computers'.  The
           NOT operation works only together with the -W option and it is
           generally applies only to the whole file rather to individual
           records.  Its output may sometimes seem counterintuitive.  Use
           with care.  glimpse -W 'fame;~glory' will output all lines
           containing 'fame' in all files that contain 'fame' but do not
           contain 'glory'; This is the most common use of NOT, and in this
           case it works as expected.  glimpse -W '~{fame;glory}' will be
           limited to files that do not contain both words, and will output
           all lines containing one of them.

      Wild cards
           The symbol '#' is used to denote a sequence of any number
           (including 0) of arbitrary characters (see LIMITATIONS).  The
           symbol # is equivalent to .* in egrep.  In fact, .* will work
           too, because it is a valid regular expression (see below), but
           unless this is part of an actual regular expression, # will work
           faster.  (Currently glimpse is experiencing some problems with
           #.)

      Combination of exact and approximate matching
           Any pattern inside angle brackets <> must match the text exactly
           even if the match is with errors.  For example, <mathemat>ics
           matches mathematical with one error (replacing the last s with an
           a), but mathe<matics> does not match mathematical no matter how
           many errors are allowed.  (This option is buggy at the moment.)

      Regular expressions
           Since the index is word based, a regular expression must match
           words that appear in the index for glimpse to find it.  Glimpse
           first strips the regular expression from all non-alphabetic
           characters, and searches the index for all remaining words.  It
           then applies the regular expression matching algorithm to the



                                   - 10 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



           files found in the index.  For example, glimpse 'abc.*xyz' will
           search the index for all files that contain both 'abc' and 'xyz',
           and then search directly for 'abc.*xyz' in those files.  (If you
           use glimpse -w 'abc.*xyz', then 'abcxyz' will not be found,
           because glimpse will think that abc and xyz need to be matches to
           whole words.) The syntax of regular expressions in glimpse is in
           general the same as that for agrep.  The union operation `|',
           Kleene closure `*', and parentheses () are all supported.
           Currently '+' is not supported.  Regular expressions are
           currently limited to approximately 30 characters (generally
           excluding meta characters).  Some options (-d, -w, -t, -x, -D,
           -I, -S) do not currently work with regular expressions.  The
           maximal number of errors for regular expressions that use '*' or
           '|' is 4. (See LIMITATIONS.)

      structured queries
           Glimpse supports some form of structured queries using Harvest's
           SOIF format.  See STRUCTURED QUERIES below for details.

 EXAMPLES
      (Run "glimpse '^glimpse' this-file" to get a list of all examples,
      some of which were given earlier.)

      glimpse -F 'haystack.h$' needle
           finds all needles in all haystack.h's files.

      glimpse -2 -F html Anestesiology
           outputs all occurrences of Anestesiology with two errors in files
           with html somewhere in their full name.

      glimpse -l -F '.c$' variablename
           lists the names of all .c files that contain variablename (the -l
           option lists file names rather than output the matched lines).

      glimpse -F 'mail;1993' 'windsurfing;Arizona'
           finds all lines containing windsurfing and Arizona in all files
           having `mail' and '1993' somewhere in their full name.

      glimpse -F mail 't.j@#uk'
           finds all mail addresses (search only files with mail somewhere
           in their name) from the uk, where the login name ends with t.j,
           where the . stands for any one character.  (This is very useful
           to find a login name of someone whose middle name you don't
           know.)

      glimpse -F mbox -h -G  . > MBOX
           concatenates all files whose name matches `mbox' into one big
           one.

 SEARCHING IN COMPRESSED FILES
      Glimpse includes an optional new compression program, called cast,



                                   - 11 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      which allows glimpse (and agrep) to search the compressed files
      without having to decompress them.  The search is actually
      significantly faster when the files are compressed.  However, we have
      not tested cast as thoroughly as we would have liked, and a mishap in
      a compression algorithm can cause loss of data, so we recommend at
      this point to use cast very carefully.  We do not support or maintain
      cast.  (Unless you specifically use cast, the default is to ignore
      it.)

 GLIMPSEINDEX FILES
      All files used by glimpse are located at the directory(ies) where the
      index(es) is (are) stored and have .glimpse_ as a prefix.  The first
      two files (.glimpse_exclude and .glimpse_include) are optionally
      supplied by the user.  The other files are built and read by glimpse.

      .glimpse_exclude
           contains a list of files that glimpseindex is explicitly told to
           ignore.  In general, the syntax of .glimpse_exclude/include is
           the same as that of agrep (or any other grep).  The lines in the
           .glimpse_exclude file are matched to the file names, and if they
           match, the files are excluded.  Notice that agrep matches to
           parts of the string!  e.g., agrep /ftp/pub will match
           /home/ftp/pub and /ftp/pub/whatever.  So, if you want to exclude
           /ftp/pub/core, you just list it, as is, in the .glimpse_exclude
           file.  If you put "/home/ftp/pub/cdrom" in .glimpse_exclude,
           every file name that matches that string will be excluded,
           meaning all files below it.  You can use ^ to indicate the
           beginning of a file name, and $ to indicate the end of one, and
           you can use * and ? in the usual way.  For example /ftp/*html
           will exclude /ftp/pub/foo.html, but will also exclude
           /home/ftp/pub/html/whatever;  if you want to exclude files that
           start with /ftp and end with html use ^/ftp*html$ Notice that
           putting a * at the beginning or at the end is redundant (in fact,
           in this case glimpseindex will remove the * when it does the
           indexing).  No other meta characters are allowed in
           .glimpse_exclude (e.g., don't use .* or # or |).  Lines with * or
           ? must have no more than 30 characters.  Notice that, although
           the index itself will not be indexed, the list of file names
           (.glimpse_filenames) will be indexed unless it is explicitly
           listed in .glimpse_exclude.

      .glimpse_filters
           See the description above for the -z option.

      .glimpse_include
           contains a list of files that glimpseindex is explicitly told to
           include in the index even though they may look like non-text
           files.  Symbolic links are followed by glimpseindex only if they
           are specifically included here.  If a file is in both
           .glimpse_exclude and .glimpse_include it will be excluded.




                                   - 12 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      .glimpse_filenames
           contains the list of all indexed file names, one per line.  This
           is an ASCII file that can also be used with agrep to search for a
           file name leading to a fast find command.  For example,
           glimpse 'count#\.c$' ~/.glimpse_filenames
           will output the names of all (indexed) .c files that have 'count'
           in their name (including anywhere on the path from the index).
           Setting the following alias in the .login file may be useful:
           alias findfile 'glimpse -h :1 ~/.glimpse_filenames'

      .glimpse_index
           contains the index.  The index consists of lines, each starting
           with a word followed by a list of block numbers (unless the -o or
           -b options are used, in which case each word is followed by an
           offset into the file .glimpse_partitions where all pointers are
           kept).  The block/file numbers are stored in binary form, so this
           is not an ASCII file.

      .glimpse_messages
           contains the output of the -w option (see above).

      .glimpse_partitions
           contains the partition of the indexed space into blocks and, when
           the index is built with the -o or -b options, some part of the
           index.  This file is used internally by glimpse and it is a non-
           ASCII file.

      .glimpse_statistics
           contains some statistics about the makeup of the index.  Useful
           for some advanced applications and customization of glimpse.

      .glimpse_turbo
           An added data structure (used under glimpseindex -o or -b only)
           that helps to speed up queries significantly for large indexes.
           Its size is 0.25MB.  Glimpse will work without it if needed.

 STRUCTURED QUERIES
      Glimpse can search for Boolean combinations of "attribute=value" terms
      by using the Harvest SOIF parser library (in glimpse/libtemplate).  To
      search this way, the index must be made by using the -s option of
      glimpseindex (this can be used in conjunction with other glimpseindex
      options). For glimpse and glimpseindex to recognize "structured"
      files, they must be in SOIF format. In this format, each value is
      prefixed by an attribute-name with the size of the value (in bytes)
      present in "{}" after the name of the attribute.  For example, The
      following lines are part of an SOIF file:
      type{17}:       Directory-Listing
      md5{32}:        3858c73d68616df0ed58a44d306b12ba
      Any string can serve as an attribute name.  Glimpse
      "pattern;type=Directory-Listing" will search for "pattern" only in
      files whose type is "Directory-Listing".  The file itself is



                                   - 13 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      considered to be one "object" and its name/url appears as the first
      attribute with an "@" prefix; e.g., @FILE { http://xxx... } The scope
      of Boolean operations changes from records (lines) to whole files when
      structured queries are used in glimpse (since individual query terms
      can look at different attributes and they may not be "covered" by the
      record/line).  Note that glimpse can only search for patterns in the
      value parts of the SOIF file: there are some attributes (like the TTL,
      MD5, etc.) that are interpreted by Harvest's internal routines.  See
      RFC 2655 for more detailed information of the SOIF format.

 REFERENCES
      1.   U. Manber and S. Wu, "GLIMPSE: A Tool to Search Through Entire
           File Systems," Usenix Winter 1994 Technical Conference (best
           paper award), San Francisco (January 1994), pp. 23-32.  Also,
           Technical Report #TR 93-34, Dept. of Computer Science, University
           of Arizona, October 1993 (a postscript file is available by
           anonymous ftp at ftp://webglimpse.net/pub/glimpse/TR93-34.ps).

      2.   S. Wu and U. Manber, "Fast Text Searching Allowing Errors,"
           Communications of the ACM 35 (October 1992), pp. 83-91.

 SEE ALSO
      agrep(1), ed(1), ex(1), glimpseindex(1), glimpseserver(1), grep(1),
      sh(1), csh(1).

 LIMITATIONS
      The index of glimpse is word based.  A pattern that contains more than
      one word cannot be found in the index.  The way glimpse overcomes this
      weakness is by splitting any multi-word pattern into its set of words
      and looking for all of them in the index.  For example, glimpse
      'linear programming' will first consult the index to find all files
      containing both linear and programming, and then apply agrep to find
      the combined pattern.  This is usually an effective solution, but it
      can be slow for cases where both words are very common, but their
      combination is not.  As was mentioned in the section on PATTERNS
      above, some characters serve as meta characters for glimpse and need
      to be preceded by '\' to search for them.  The most common examples
      are the characters '.' (which stands for a wild card), and '*' (the
      Kleene closure).  So, "glimpse ab.de" will match abcde, but "glimpse
      ab\.de" will not, and "glimpse ab*de" will not match ab*de, but
      "glimpse ab\*de" will.  The meta character - is translated
      automatically to a hypen unless it appears between [] (in which case
      it denotes a range of characters).  The index of glimpse stores all
      patterns in lower case.  When glimpse searches the index it first
      converts all patterns to lower case, finds the appropriate files, and
      then searches the actual files using the original patterns.  So, for
      example, glimpse ABCXYZ will first find all files containing abcxyz in
      any combination of lower and upper cases, and then searches these
      files directly, so only the right cases will be found.  One problem
      with this approach is discovering misspellings that are caused by
      wrong cases.  For example, glimpse -B abcXYZ will first search the



                                   - 14 -      Formatted:  December 26, 2024






 GLIMPSE(l)                                                       GLIMPSE(l)
                              November 10, 1997



      index for the best match to abcxyz (because the pattern is converted
      to lower case); it will find that there are matches with no errors,
      and will go to those files to search them directly, this time with the
      original upper cases.  If the closest match is, say AbcXYZ, glimpse
      may miss it, because it doesn't expect an error.  Another problem is
      speed.  If you search for "ATT", it will look at the index for "att".
      Unless you use -w to match the whole word, glimpse may have to search
      all files containing, for example, "Seattle" which has "att" in it.
      There is no size limit for simple patterns and simple patterns within
      Boolean expressions.  More complicated patterns, such as regular
      expressions, are currently limited to approximately 30 characters.
      Lines are limited to 1024 characters.  Records are limited to 48K, and
      may be truncated if they are larger than that.  The limit of record
      length can be changed by modifying the parameter Max_record in
      agrep.h.  Glimpseindex does not index words of size > 64.

 BUGS
      In some rare cases, regular expressions using * or # may not match
      correctly.  A query that contains no alphanumeric characters is not
      recommended (unless glimpse is used as agrep and the file names are
      provided).  This is an understatement.  The notion of "match to the
      whole word" (the -w option) can be tricky sometimes.  For example,
      glimpse -w 'word$' will not match 'word' appearing at the end of a
      line, because the extra '$' makes the pattern more than just one
      simple word.  The same thing can happen with ^ and with _.  To be on
      the safe side, use the -w option only when the patterns are actual
      words.  Please send bug reports or comments to gvelez@webglimpse.net.

 DIAGNOSTICS
      Exit status is 0 if any matches are found, 1 if none, 2 for syntax
      errors or inaccessible files.

 AUTHORS
      Udi Manber and Burra Gopal, Department of Computer Science, University
      of Arizona, and Sun Wu, the National Chung-Cheng University, Taiwan.
      Now maintained by Golda Velez at Internet WorkShop (Email:
      gvelez@webglimpse.net)

















                                   - 15 -      Formatted:  December 26, 2024