INSTALLATION 1) Change makefile settings to reflect ATT vs. BSD software termio vs. termcap MGR vs. no MGR (MGR is a BELLCORE produced window manager that is also available free to the public.) 2) Then, just say "make". If you want to "make install", you should first change definition of INSDIR in the makefile 3) to test the software say spiff Sample.1 Sample.2 spiff should find 4 differences and you should see the words "added", "deleted", "changed", and "altered" as well as four number in stand-out mode. spiff Sample.1 Sample.2 | cat should produce the same output, only the differences should be underlined However, on many terminals the underlining does not appear. So try the command spiff Sample.1 Sample.2 | cat -v or whatever the equivalent to cat -v is on your system. A more complicated test set is found in Sample.3 and Sample.4 These files show how to use embedded commands to do things like change the commenting convention and tolerances on the fly. Be sure to run the command with the -s option to spiff: spiff -s 'command spiffword' Sample.3 Sample.4 These files by no means provide an exhaustive test of spiff's features. But they should give you some idea if things are working right. This code (or it's closely related cousins) has been run on Vaxen running 4.3BSD, a CCI Power 6, some XENIX machines, and some other machines running System V derivatives as well as (thanks to eugene@ames.arpa) Cray, Amdahl and Convex machines. 4) Share and enjoy. AUTHOR'S ADDRESS Please send complaints, comments, praise, bug reports, etc to Dan Nachbar Bell Communications Research (also known as BELLCORE) 445 South St. Room 2B-389 Morristown, NJ 07960 nachbar@bellcore.com or bellcore!nachbar or (201) 829-4392 (praise only, please) OVERVIEW OF OPERATION Each of two input files is read and stored in core. Then it is parsed into a series of tokens (literal strings and floating point numbers, white space is ignored). The token sequences are stored in core as well. After both files have been parsed, a differencing algorithm is applied to the token sequences. The differencing algorithm produces an edit script, which is then passed to an output routine. SIZE LIMITS AND OTHER DEFAULTS file implementing limit name default value maximum number of lines lines.h _L_MAXLINES 10000 per file maximum number of tokens token.h K_MAXTOKENS 50000 per file maximum line length misc.h Z_LINELEN 1024 maximum word length misc.h Z_WORDLEN 20 (length of misc buffers for things like literal delimiters. NOT length of tokens which can be virtually any length) default absolute tolerance tol.h _T_ADEF "1e-10" default relative tolerance tol.h _T_RDEF "1e-10" maximum number of commands command.h _C_CMDMAX 100 in effect at one time maximum number of commenting comment.h W_COMMAX 20 conventions that can be in effect at one time (not including commenting conventions that are restricted to beginning of line) maximum number of commenting comment.h W_BOLMAX 20 conventions that are restricted to beginning of line that are in effect at one time maximum number of literal comment.h W_LITMAX 20 string conventions that can be in effect at one time maximum number of tolerances tol.h _T_TOLMAX 10 that can be in effect at one time DIFFERENCES BETWEEN THE CURRENT VERSION AND THE ENCLOSED PAPER The files paper.ms and paper.out contain the nroff -ms input and output respectively of a paper on spiff that was given the Summer '88 USENIX conference in San Francisco. Since that time many changes have been made to the code. Many flags have changed and some have had their meanings reversed, see the enclosed man page for the current usage. Also, there is no longer control over the granularity of object used when applying the differencing algorithm. The current version of spiff always applies the differencing in terms of individual tokens. The -t flag controls how the edit script is printed. This arrangement more closely reflects the original intent of having multiple differencing granularities. PERFORMANCE Spiff is big and slow. It is big because all the storage is in core. It is a straightforward but boring task to move the temporary storage into a file. Someone who cares is invited to take on the job. Spiff is slow because whenever a choice had to be made between speed of operation and ease of coding, speed of operation almost always lost. As the program matures it will almost certainly get smaller and faster. Obvious performance enhancements have been avoided in order to make the program available as soon as possible. COPYRIGHT Our lawyers advise the following: Copyright (c) 1988 Bellcore All Rights Reserved Permission is granted to copy or use this program, EXCEPT that it may not be sold for profit, the copyright notice must be reproduced on copies, and credit should be given to Bellcore where it is due. BELLCORE MAKES NO WARRANTY AND ACCEPTS NO LIABILITY FOR THIS PROGRAM. Given that all of the above seems to be very reasonable, there should be no reason for anyone to not play by the rules. NAMING CONVENTIONS USED IN THE CODE All symbols (functions, data declarations, macros) are named as follows: L_foo -- for names exported to other modules and possibly used inside the module as well. _L_foo -- for names used by more than one routine within a module foo -- for names used inside a single routine. Each module uses a different value for "L" -- module files letter used implements spiff.c Y top level routines misc.[ch] Z various routines used throughout strings.[ch] S routines for handling strings edit.h E list of changes found and printed tol.[ch] T tolerances for real numbers token.[ch] K storage for objects float.[ch] F manipulation of floats floatrep.[ch] R representation of floats line.[ch] L storage for input lines parse.[ch] P parse for input files command.[ch] C storage and recognition of commands comment.[ch] W comment list maintenance compare.[ch] X comparisons of a single token exact.[ch] Q exact match differencing algorithm miller.[ch] G miller/myers differencing algorithm output.[ch] O print listing of differences flagdefs.h U define flag bits that are used in several of the other modules. These #defines could have been included in misc.c, but were separated out because of their explicit communication function. visual.[ch] V screen oriented display for MGR window manager, also contains dummy routines for people who don't have MGR I haven't cleaned up visual.c yet. It probably doesn't even compile in this version anyway. But since most people don't have mgr, this isn't urgent. NON-OBVIOUS DATA STRUCTURES The Floating Point Representation Floating point numbers are stored in a struct R_flstr The fractional part is often called the mantissa. The structure consists of a flag for the sign of the factional part the exponent in binary a character string containing the fractional part The structure could be converted to a float via atof(strcat(".",mantissa)) * (10^exponent) To be properly formed, the mantissa string must: start with a digit between 1 and 9 (i.e. no leading zeros) except for the zero, in which case the mantissa is exactly "0" for the special case of zero, the exponent is always 0, and the sign is always positive. (i.e. no negative 0) In other words, (except for the value 0) the mantissa is a fractional number ranging between 0.1 (inclusive) and 1.0 (exclusive). The exponent is interpreted as a power of 10. Lines there are three sets of lines: implemented in line.c and line.h real_lines -- the lines as they come from the file content_lines -- a subset of reallines that excluding embedded commands implemented in token.c and token.h token_lines -- a subset of content_lines consisting of those lines that have tokens that begin on them (literals can go on for more than one line) i.e. content_lines excluding comments and blank lines. THE STATE OF THE CODE Things that should be added visual mode should handle tabs and wrapped lines handling huge files in chunks when in using the ordinal match algorithm. right now you have to parse and then diff the whole thing before you get any output. often, you run out of memory. Things that would be nice to add output should optionally be expressed in real line numbers (i.e. including command lines) at present, all storage is in core. there should be a compile time decision to allow temporary storage in files rather than core. that way the user could decide how to handle the speed/space tradeoff a front end that looked like diff should be added so that one could drop spiff into existing shell scripts the parser converts floats into their internal form even when it isn't necessary. in the miller/myer code, the code should check for matching end sequences. it currently looks matching beginning sequences. Minor programming improvements (programming botches) some of the #defines should really be enumerated types all the routines in strings.c that alter the data at the end of a pointer but return void should just return the correct data. the current arrangement is a historical artifact of the days when these routines returned a status code. but then the code was never examined, so i made them void . . . comments should be added to the miller/myer code in visual mode, ask for font by name rather than number