KWICX(1) KWICX(1) March 29, 1994 NAME kwicx - Key Word In Context eXtended program that handles 8 and 16-bit characters SYNOPSIS kwicx [-8bhivCFISU] [-c n] [-p n] [-f n] [-s c] [-l s] [-L s] [-r s] [-R s] [-o f] [-a s] [-m n] -k s <file1> <file2> ... DESCRIPTION kwicx is a program that scans for a keyword a list of filenames specified on the command line. One of the strengths of kwicx is that it can also search for 16-bit characters (see LIMITATIONS). Currently, the only 16-bit forms kwicx can handle are GuoBiao Chinese (with the high bits set), Big5 Chinese, EUC Japanese, and EUC Korean. kwicx also provides a method for specifying what should and should not appear the preceding and following contexts (see OPTIONS). The way these work is in a logical-and fashion: 1. All patterns specified with the -l,--left-with command line parameter must appear in the context preceding the match. 2. All patterns specified with the -L,--left-without command line parameter must not appear in the context preceding the match. 3. All patterns specified with the -r,--right-with command line parameter must appear in the context following the match. 4. All patterns specified with the -R,--right-without command line parameter must not appear in the context following the match. kwicx handles case insensitive searches on Latin-1 (primarily Western European) text correctly if the -8,--eight-bit flag is specified on the command line. kwicx can handle pretty much any size file and any number of files. OPTIONS You can specify command line options for kwicx in two ways: the short form or the long form. The short form begins with a single dash and the long form begins with two dashes. In the following list of command line options, the options are specified as -shortform,-- longform. The long form is recommended in places where other people will be looking at the command line parameters in places like shell scripts. -h,--help A list of the command line options. - 1 - Formatted: January 15, 2025 KWICX(1) KWICX(1) March 29, 1994 -v,--version Print the version of this program. -k,--keyword s Search for keyword s. -i,--case-insensitive Make search case insensitive. Searching defaults to case sensitive. -8,--eight-bit Assume that every byte is an individual character. By default, kwicx assumes that the text contains mixed ASCII and 16-bit characters (see LIMITATIONS). -b,--byte-offsets Print a list of byte offsets instead of the text of the matches. -c,--context-size n Set preceding and following context size to n chars (default: 10). -f,--following-context-size n Set the following context size to n chars (default: 10). -p,--preceding-context-size n Set the preceding context size to n chars (default: 10). -s,--highlight-string s Set the string to print between the match and context. The default is to print one ASCII space before and after the matched pattern. This can be a string of characters instead of a single character as is often used in other kwic-like programs. There are four special characters that can be put into this string: \n - Prints a newline character \r - Prints a carriage return character \t - Prints a tab character \e - Prints an escape character -S,--no-highlight-string Don't print ANY string between the match and the context. -F,--no-filenames Don't print the filename when a match occurs. -I,--print-lower Print the contexts and match in lower case. -C,--dont-collapse-spaces Don't view contiguous spaces, tabs, newlines, and carriage returns - 2 - Formatted: January 15, 2025 KWICX(1) KWICX(1) March 29, 1994 as single spaces. By default, all contiguous spaces, tabs, newlines and carriage returns are viewed as a single space. -U,--spaces-between-euc Allow spaces between 16-bit chars. The default is to assume that no spaces occur between 16-bit characters for searching purposes. -l,--left-with s The pattern s must occur in the left context. -L,--left-without s The pattern s must not occur in the left context. -r,--right-with s The pattern s must occur in the right context. -R,--right-without s The pattern s must not occur in the right context. -o,--output-file f The output file to print the matches to. -a,--print-after s The string s is printed after the contexts and match are printed. This string can contain the same special characters as those available in the -s option. -m,--number-matches n The maximum number of matches to find. LIMITATIONS kwicx cannot search text from standard input. The filenames to search must be specified on the command line. Another limitation is that if byte offsets are requested on the command line, only the byte offsets of the matched pattern are printed. Printing of byte offsets of the preceding and following contexts will be added if there enough requests. No support for searching mixed Latin-1 and 16-bit characters. No regular expression support is available yet. INTERNALS The searching algorithm used by kwicx is a modified version of a Boyer-Moore implementation done by Ted Dunning (see AUTHORS). SEE ALSO jkwic(1) - 3 - Formatted: January 15, 2025 KWICX(1) KWICX(1) March 29, 1994 AUTHORS ted@crl.nmsu.edu (Ted Dunning), mleisher@crl.nmsu.edu (Mark Leisher) FILES None. DIAGNOSTICS kwicx will exit with a 0 if one or more matches are found, a 1 if no matches are found and a 2 if some other error occurs. BUGS No Big5 handling yet. Contexts aren't formatted nicely for mixed 8 and 16-bit characters. Creaping Featurism! - 4 - Formatted: January 15, 2025