Changes in current version - Added support for setting transfer
timeout times through envirnoment variables, and config file
settings (Peter Scott)
- Fixed problem with broker registry not correctly handling
deletion of duplicate objects (Hrvoje Stipetic) - Added
support for relative URLs in client pull inforamtion, and for the
HTML refresh directive. (Hrvoje Stipetic)
- Change the content of the "description" attribute to be the first
255 characters of the objects "body" or "partial-text" attribute
(Peter Valkenburg)
- SGML summariser fixes - * to allow mapping of
Dublin Core directives to attributes * to autmatically
prefix meta generated elements with a custom prefix
* to 'fix' meta generated attributes to only contain
legal SOIF characters (Hrvoje Stipetic)
- Added support for printing total number of items found when in
perpage mode to nph-search (Hermann Straus)
- Altered broker behaviour so that objects with no TTL value are
never expired (Simon Wilkinson)
- Fixed bug in gather which would result in incorrect HELLO
messages
being sent to servers (Dave Beckett) - Fixed broker bug
where duplicate copies of a URL could remain in
the broker in certain circumstances (Hrvoje Stipetic) -
Added reporting of number of objects added to broker output
(Hrvoje Stipetic) - Added broker version information to broker
greeting message (Hrvoje Stipetic) - Fixed bug where
admin password could be written to broker log (Hrvoje Stipetic)
- Removed unnecessary print's in the nph-search script -
Fixed parsing of broker configuration files so that whitespace is
correctly handled in arguments (Peter J. Scott & Hrvoje Stipetic) -
Added -V option to the broker to return the version number
(Hrvoje Stipetic) - Fixed definition of random() in stor_man.c for
better portability (Otis Gospodnetic) - Improved result
ranking in nph-search for occurence of search string in URL and
titles (Bruce R. Lewis) - Fix to LocalMapping code to remove
problem where FD 0 (stdin) could be closed by mistake (Simon
Wilkinson) - Fixes to SGML.sum to handle problems with HTTP-EQUIV
and spaces in META NAME attributes (Hrvoje Stipetic) -
Fix to broker where compression would unlink open files, and
eventually the broker would run out of disk space (Hrvoje Stipetic)
- Fixes to sys_errlist definitions for Linux compatibility (Tim
Riker) - Fixed local file handling so that it doesn't hang if it
tries to access a "special" file (Simon Wilkinson) -
Fixes for glimpse internationalization - ISO_CHAR_SET has to
be not only defined, but be non-zero (Marjan Erzen) -
Added status code logging to liburl, so failed requests are
logged with the reason for failure (Marjan Erzen) - Altered
robots.txt code, so that an empty User-Agent exists every one may
collect data (Marjan Erzen) - Fixed gather.c so that it will
compile with Sun's CC, and so it doesn't redefine existing code
(Marjan Erzen) - Added a new admin command to the Broker - 'parse-
template' which will add a SOIF stream to the broker (Marjan
Erzen) - Added a command line md5 tool (Marjan Erzen)
- Altered Makefiles so that objects can be built outside of the
current source directory when using a make that supports VPATH
- such as gnumake. (Simon Wilkinson) - Added support for
sending Accept: headers to http.c in liburl. Currently sends
Accept: */* (Simon Wilkinson) - Fixed filter.c so both host and
portnumber are made available for
regular expressions (Wesley Alan Wright) - Added a SWISH
based indexing engine, which is considerably more
efficient than Glimpse, if slightly less fully featured
(Simon Wilkinson)
- Changes to nph-search.in so that the truncation warnings aren't
erroneously displayed when the user hasn't supplied a value for
maxobjflag (Allyn Fratkin)
- Added support for META client pull redirections to HTMLurls
(Hrvoje Stipetic)
- Increased limits in HTML.decl (Tim Riker) - Fix for
gather command's uncompressed transfer routine so that it
now transfers blocks correctly in most circumstances (Marjan
Erzen) [This patch went in in pl10 - but the ChangeLog entry was
omitted] - Added N-at-a-time output to the nph-search CGI, so
results can be paged. (David Hoekman) - Added noregex
option to nph-search and to the broker Glimpse interface to allow
the disabling of regular expressions in search patterns (Simon
Wilkinson) - All Harvest binaries should now honour setting of
prefix as being the install location, *when run normally*. Many
still rely upon HARVEST_HOME to point them to the correct
location, but this is provided by the support scripts which run
them (Simon Wilkinson) - Added support for setting defaults in the
query configuration file, so defaults values can be given for all
parameters that are usually passed into the CGI from the Web
search form (Simon Wilkinson) - Added support for sorting query
results by rank to nph-search (Wesley Alan Wright)
- Rewrote most of the Makefiles and altered the configure scripts
so that the build process is more rational, and so that the
targets are now "standardised". In addition a number of problems
with the makefiles have been fixed, and all of the configure
routines use one central cache file - speeding up the build
process (Simon Wilkinson) - Corrected order of adding
files to PATH so that Harvest binaries are now always used in
preference to system ones (Vincent Winczewski) - Replaced
BrokerAdmin.cgi with a perl script which accepts POST requests.
This fixes the potential problem of passwords being sent in GET
requests which were visibile in servers log files (Simon
Wilkinson) - Fixed bug in robots.txt code which caused segmentation
faults when reading incorrect files with empty User-Agent lines
(Simon Wilkinson) - Updated nph-search so that it only sends nph-
headers if invoked as an nph script, and so that it will
automatically decode HTML entities. Removed BrokerQuery.pl (now
symlink to nph-search) (Craig Counterman) - Added a new
perl HTML summariser (Andy Powell)
- Replaced all of the HSR tools with new, fixed versions supplied
by Mic Bowman - Added changes to SGML.sum to make it
compatible with nsgmls, part of SP. Error reports now on by
default (Craig Counterman) - Altered broker/Glimpse/index.c so that
case and word matching are disabled when an errorflag is provided
(Craig Counterman) - Fixed broker/Glimpse/index.c so it compiles on
an SGI (Simon Woods) - Altered http code so it now sends a user
controllable User-Agent and Maintainer address which can be set
in the config file (Hrvoje Stipetic) - Altered httpenum-
breadth so that the count of objects retrieved is accurate
(Hrvoje Stipetic) - Altered summarisers Makefile so that Pdf.sum is
now installed by default (Simon Wilkinson) - Added
-DAGREP_POINTER=1 to glimpse Makefiles - fixes segmentation
faults with picky mallocs (Dan Riley) - Replaced HTMLparse with a
newer version from the Mosaic 2.7b5 which handles comments
correctly (Simon Wilkinson)
- Changed Postscript.sum to a sh script, so stderr doesn't corrupt
the generated SOIF (Simon Wilkinson)
- Added some #includes for AIX compatibility (Simon Wilkinson)
- Default url-filter now blocks VRML files (Craig Counterman)
- Perl scripts which forced /usr/local/harvest changed
(Craig Counterman) - Altered rfc1738 unescaping so it can cope with
badly formed URLs (Simon Wilkinson) - Assorted changes
for compilation under BSDI (Simon Wilkinson) - Fixed configure
files so libraries are included in correct order
for Solaris (Simon Wilkinson) - Fixed SGML.sum so that
all debug output is sent to STDERR so it doesn't corrupt
generated data (Simon Woods) - Fixed undef of SOIF associative
array for compatibilty with Perl 5.004 (Bill Corley) -
Fixed bug where a url wasn't closed in httpenum-breadth (Wolfgang
Klimt) - Made broker more resilient to glimpse bugs (Simon
Wilkinson) - Fixed commenting mistake in glimpse Makefile (Jeffrey
Goldberg)
- Changed broker config so numbers are now indexed by glimpse
- Added missing ftp files ftp.pl and chat2.pl (Simon Wilkinson)
- Fix for Year 2000 logging and file naming problem (logs
should now be produced using 4 digits for the year) (Simon Wilkinson,
suggested by Craig Counterman) - Fixed gatherers so they
don't revisit pages that they've previously found to be broken
(Simon Wilkinson, suggested by David A. Nowitz) - Fixed breadth
first httpenum so it correctly handles META robots control tags
(Simon Wilkinson) - Rewrote Troff.sum in perl (David A. Nowitz)
- Added DOCTYPES and tidied up HTML generally (Craig Counterman)
- Fixed × bug in ×tamp call (Craig Counterman) -
Fixed assorted bugs in nph-search (Craig Counterman)
Changes to v1.5 - Altered some lseek() calls for BSD compatibility
(Martin Hamilton) - Removed optimisation from glimpse configuration
(Simon Wilkinson) - Added -nocol option to broker (Dave Beckett)
- Added virtual host support using the "Host:" header in HTTP/1.1
(Juha Laiho) - Added the HTML 3.2 ISOlat definition
(Simon Wilkinson) - Added support for META Robots tags (Simon
Wilkinson) - Altered broker / glimpse interface so that an attempt
to use regular
expressions on word boundries results on word matching being
turned off, rather than an error message (Simon Wilkinson)
- Fixed core dumps in httpenum when accessing pages requiring
authentication (Simon Wilkinson) - Changed url_retreive
behaviour so original caller gets notified of Redirect. This
fixes a bug which made it possible to index a site and completely
ignore their robots.txt file (Simon Wilkinson) - Added wildcards to
Local-Mapping (Bruce R. Lewis) - Added support to the broker for
terminating connections if the client goes away (Bruce R. Lewis)
- Added depth first gatherer for both HTTP and Gopher enumeration.
(Peter Scott) - Added hooks added to enum, prepurls and
Gatherer script to support user selection of search technique via
the Search variable (Simon Wilkinson) - Bug in depth
first enumerator where a URL first encountered at a depth deeper
than max depth isn't indexed when its seen at a lower depth fixed
(Peter Scott) - Depth counts added to the depth first enumerator
(Peter Scott) - Robots.txt checks added to the depth first HTTP and
Gopher depth first enumerators (Simon Wilkinson) - Added
code to grab URLs from HTML containing frames and client side
image maps (Dean Marino) - Altered breadth first HTTP and Gopher
enumerators so that the server containing the root page is marked
as visited. (Julian Field) - Altered enumerators so URL count is
increased for visited pages only (Ed Knowles) - Altered
HTMLurls so HREFS can now contain new lines (Peter Scott) - Altered
depth first enumerators to reduce number of temporary files
generated (Simon Wilkinson) - Fixed enumerators to avoid buffer
overflows with long URLs (Simon Wilkinson) - Added
MAX_FILTERS env var to URL filter code (Simon Wilkinson) - Fixed
url code to avoid unnecessary generation of symlinks (Bruce R.
Lewis) - Added support for HTTP/1.1 servers to url code (Simon
Wilkinson) - Fixed end of header detection bug in url code (Paul
Johnson) - Fixed temporary file closing bug in url code (Bruce R.
Lewis) - Altered Makefiles so realclean removes config.cache (Simon
Wilkinson) - Altered robots.txt handling so it can cope with errors
in server robots files. Also some performance improvements. (Ed
Knowles) - Added support for user specified User-Agent strings to
the robots.txt code (Simon Wilkinson) - Changed method of
comparing User-Agent string against robots.txt file - now correct
behaviour as per the robots specification (Simon Wilkinson)
- Fixed bug where robots.txt code left file pointer unclosed (Ed
Knowles) - Fixed bug where paths were compared case-
insensitively in the robots.txt code. (Simon Wilkinson) -
Altered Gatherer code so it converts ~ and %7E to %7e in Local-
Mapping strings (Simon Wilkinson) - Added Locale option to Gatherer
(Simon Wilkinson) - Added html32.dtd - the official Wilbur DTD
- Fixed usage of fork in ps2txt wrapper (Peter Scott) -
Fixed Harvest script so the RunGatherd scripts it creates include
correct PATH (Simon Wilkinson) - Fixed Rainbow summariser so it rm
-r's instead of unlinking - Altered files so they will compile
under NetBSD (Martin Hamilton) - Altered configure scripts for
NetBSD support (Dave Beckett) - Upgraded included version of
glimpse to 4.0 - Fixed broker so the port picked for glimpseindex
is less than 30000 (Peter Scott) - Added support for NOT
operator to broker glimpse interface code - Fixed signal handling
bug in glimpseserver (Simon Wilkinson) - Added warning for object
truncated result sets to BrokerQuery.pl.cgi - Added nph-search CGI
for displaying results "as they come" (Bruce R.
Lewis) - Altered query cgis to escape quotes in
$html_query (Simon Wilkinson) - Added explanation of the Log-Key
directive to broker.conf (Dave Beckett) - Fixed bug in
the broker lex file so searches containing an apostrophe will
work Changes to v1.4.pl2:
- Added "robots.txt" support to the gatherer enumeration.
- Added "prefix =" to components/*/Makefile. - Changed
Gopher timeout to 120 seconds. - Changed HTTP-Query byurl pattern
to be any URL with a question mark. - Added
HARVEST_NOT_VISITED_LOG env var to httpenum. - Added support for
Glimpse-based broker to limit the number of matched lines per
object. - Protect single quotes in Gatherer-Name. -
Updated html-mcom.dtd - Changed BrokerQuery.pl to not use a
tmpfile, and to sort the results by number of matched lines.
- Fixed HTTP authentication to work with Netscape server, and
support encoding spaces as RFC1738 escapes. - Fixed
gatherd timeout bug (caused by eliminating the DNS mismatch
warning). - Fixed Carriage-Return substitution in RTF.sum.
- Changed SGML.sum to not do "word wrap" on very large strings.
- Fix HTML-lax.sum to turn carraige returns to spaces. -
Added --body-text option to HTML-lax.sum and in the comments of
HTML.sum. - Fixed SGML.sum to NOT rewrite correct DOCTYPE
declarations. - Changed our log() function to be called Log().
- Changed handling of depth between enum programs. -
Changed broker connect timeout to happen only after some data has
been read. - Added 'Access-Delay' to gatherer.cf. Now adds delay
for LeafNode URLs. - Fixed 'gather' bug when reading
binary data in non-compressed mode. - Configure:
upgraded to v2.7 - BrokerQyery.pl.cgi: protect special characters
in a Broker name. - Essence: Print [L] for URLs where local
mapping succeeds. - Update the Users manual.
Changes between release v1.4.pl1 (November 17, 1995) and v1.4
- Changes to the Gatherer - Fixed NULL BASE URL coredump bug in
HTMLurls - Fixed Gatherer to make Top-Directory set Lib-Directory
value also (like the manual says it does in section 4.6.1).
- Fixed essence and SGML.sum to look in multiple lib dirs. Look
first in Lib-Directory if set, otherwise in
$HARVEST_HOME/lib/gatherer.
- Changes to the Broker - Added <sys/select.h> for compiling on
AIX. - Changed BrokerQuery.pl.cgi to send the query to the broker
before opening the tmpfile. If there is a delay in opening
the tmpfile the broker query could time out. - Fixed
potential coredump in Log_rotate() due to large local array.
##############################################################################
Changes between release v1.4 (November 10, 1995) and v1.3
- Changes to the Gatherer: - Added symbolic link loop detection to
httpenum. - Added a GIF image summarizer (GIFImage.sum), requires
netpbm. The GIFImage type is still in the Essence stoplist by
default. - Added 'C' version of ftpget. - Added ability
to rewrite the SOIF template URL with Essence post-processing.
Could be used to gather file:// URLs and have them exported as
http:// URLs. - Added the ability to specify a program to generate
root/leaf URLs. - Fixed select() timeouts to POSIX semantics.
- Fixed SGML summarizer to give error if input is empty. -
Fixed a Makefile to actually build and install HTML-lax.sum. -
Fixed liburl problem with AFS. Must *copy* files into the
cache-liburl directory. - Fixed News gatherering: If 'newsget.pl'
exits non-zero, close the NNTP server socket. - Fixed
newsget.pl with a major rewrite. - Fixed 'fileenum' to use URLs
and not always return file://hostname/. - Fixed gatherd bug where
child process would remove parent's gatherd.pid file. -
Changed NewsArticle.sum TTL to 7 days by default. - Changed
Essence unnesting to occur in individual directories. - Removed
confusing gatherd DNS mismatch warning message.
- Changes to the Broker: - Added #Restart-Index-Server command to
broker admin command set. - Added error logging and debugging in
Glimpse inline query code. - Fixed select() timeouts to POSIX
semantics. - Fixed Glimpse minor malloc problems. - Fixed
the broker on Linux; needs unbuffered input from gather process.
- Fixed broker query language bug for high-bit (international)
characters. - Changed Broker to allow specifically
setting GlimpseServer_Port again; if not set, port is chosen
randomly. - Changed BrokerAdmin.cgi to use unbuffered output.
- Changed Glimpse macros CLEANUP and RETURN to be functions.
- Changed broker admin/LOG to log FQDN instead of IP address.
- Remove glimpse version ambiguities in Glimpse/index.c. -
Removed getpeername() call in the broker; get address from accept().
- Changes to the Cache: - The cache has been moved to a separate
distribution.
- Miscellaneous Changes - Dont link with -lmalloc on Solaris.
- Fixed User Manual and FAQ inconsistencies.
##############################################################################
Changes between release v1.3 (September 7, 1995) and v1.3.beta:
- Changes to the Broker:
- Added support for auto-validation from the HSR which
includes a description.html file, RunUpdate program
for each new Broker.
- Changes to the Cache:
- Added support to dynamically toggle debug level via USR1 and
USR2.
- Fixed dnsserver parsing numeric addresses.
- Added patches for FreeBSD.
- Changed source_ping to off by default.
- Added optional code for 'local_ip' line in cached.conf.
Addresses given as 'local_ip' will be retrieved directly,
without sending any probe packets.
- Added 'TIMEOUT_DIRECT' as a new kind of entry in
cache_hierarchy.log.
- Changes to the Gatherer:
- Added LMT.gdbm to liburl to keep last-modified-timestamps.
- Added support for using BASE element in HTML enumeration.
- Added support for HTML-3.0 DTD. - Added support for
Netscape DTD. - Added support for HotJava DTD. - Added
old HTML.sum as HTML-lax.sum. - Added MacBinHex as a supported
nested type in essence. - Changed gatherd to die when its data
directory gets removed.
- Fixed bug: repeated HTTP redirected URLs (with help from
glenn@rockie.nsc.com)
- Miscellaneous Changes
- Incorporated fixes for FreeBSD port from ted@oz.plymouth.edu.
- Incorporated fixes for Ultrix port from
dsr@lns598.lns.cornell.edu.
##############################################################################
Changes between release v1.3.beta (August 7, 1995) and v1.2:
- Changes to the Broker:
- Upgraded to Glimpse 3.0.
- Improved and updated WAIS, Inc. support to use version 2.1.1.
- Added support for Verity VDK as backend indexer/searcher.
- Added support for GRASS GIS as spatial database.
- Added support for PLS, Inc. PLWeb as backend indexer/searcher.
- Added IP numbers for incoming requests to log information.
- Added support for displaying individual SOIF attributes via WWW.
- Added 'Uniqify' command to Broker; keeps most current object
of duplicate URLs.
- Added security and name lookup to BrokerQuery.pl.
- Added support for Glimpse inline queries.
- Added error message to report incorrect WWW installation.
- Added some support for Internationalization in the Broker
- Added support for automatic validation by HSR.
- Removed need for 'gzip' in the Broker.
- Changed BrokerQuery.pl to try multiple entries from Brokers.cf
- Changed broker to read queries with a timeout. Very long queries
can get segmented by TCP.
- Fixed bug with matching Description attributes.
- Fixed bug with Glimpse regular expression detection.
- Fixed bug in CreateBroker -- wrong default Gatherer port number.
- Changes to the Cache:
- Added persistent disk storage across cached reboots.
- Added IP-based access control.
- Added setting of the TTL based on URL regular expressions.
- Added more sophisticated setting of the TTL based on HTTP
headers.
- Added more statistics information.
- Added support for logging using the common httpd logfile format.
- Added support for HEAD HTTP request method.
- Added support for user-configurable periodic garbage collection.
- Added support for user-configurable stoplist.
- Added support for WAIS proxy'ing (from Edward Moy, Xerox PARC).
- Added support for quick aborting when client drops connection,
cached stops immediately. Useful for slow network links.
- Added high/low water marks for disk storage.
- Added 'source_ping' to cached.conf.
- Added 'dns_children' to cached.conf.
- Added -z to force a cached to discard (zap) its disk storage.
- Added logging of ftpget.pl failures (exit codes and signals).
- Added Expires timestamp to cache log
- Improved error messages for DNS name lookup failures.
- Improved performance of LRU replacement policy.
- Improved performance for generating statistics.
- Increased listen(2) socket queue size to 50 or max of OS.
- Removed all Tcl code.
- Cleaned memory allocation and management.
- Cleaned up and updated cached.conf.
- Cleaned up debugging output.
- Changed default low watermark to 60%.
- Changed trace mail into cached.conf option.
- Changed algorithm for time estimations using echo ports.
- Changed dnsserver to try gethostbyname(3) again sometimes
- Fixed bugs with URL intepretation.
- Fixed bugs with internal IPcache memory management.
- Fixed bug with DNS lookups on IP numbers.
- Fixed bug with not finding 'dnsserver'.
- Fixed bug with hard timeouts in select loop.
- Fixed bug with some platforms needing strdup().
- Fixed bug with ftpget.pl not including MIME content-type for
unknown filename extensions.
- Fixed bug with ftpget.pl not parsing ls output correctly
(wasn't matching dashes in user/group names).
- Fixed copyright messages in source code.
- Fixed realloc() bug for concurrent object access.
- Fixed bug when neighbors specified and dns_servers != 3.
- Fixed bug with new hash tables when deleting from table as it is
being traversed.
- Fixed various minor bugs.
- Changes to the Gatherer:
- Added ability to pass enumerated URLs through an external
filter program. Allows very specific selection of URLs to
further enumerate.
- Added -background flag to the Gatherer; does export work in bg.
- Added IP-based filtering (regular expressions) in host-filter
- Added Post-processing of summaries to Essence
- Added 'gather' check for 'gzip' before setting compression
option.
- Added username/password support for HTTP retrievals
- Changed gatherer to remove cache-liburl directory after a
successful
gather session.
- Fixed bug: Infinite loops in 'enum' on Invalid URLs
- Fixed bug: HTTP headers not parsed from slow servers
- Improved URL parsing; support for username/password in FTP urls.
- Miscellaneous Changes
- Upgraded autoconf 'configure' scripts to v2.4.
- liburl: better handling of relative URLs.
- liburl retrieval programs abort very large transfers (at 10
Mbytes)
- Fixed bug with subscribing to harvest-users mailing list.
##############################################################################
Changes between release v1.2 (April 3, 1995) and v1.1:
- Changes to the Broker:
- Major performance improvements to the collector interface.
- Added fast, efficient internal Gatherer ID management.
- Added support for clients requesting attributes with #attribute.
- Added support for log file rotation, and terse logging.
- Added support for #operation in query manager interface.
- Cleaned up the log file format.
- Cleaned up the administrative interface.
- Cleaned up the UNIX file system-based storage manager.
- Fixed major bug with WAIS support.
- Fixed file descriptor leaks in glimpseserver when the index
contained files that had since been deleted.
- Fixed bug with overflowing lines from glimpse.
- Fixed bug with hostname initialization.
- Fixed memory leak with the Description-Tag attribute matching.
- Fixed various minor bugs.
- Changes to the Cache:
- Added httpd accelerator support.
- Added IP number logging.
- Added setuid() to a user when cached is run as root.
- Added support for HTTP servers that die abruptly.
- Added client_timeout which places a hard limit on the life
of incoming connections on the ascii port, or on outgoing
HTTP or Gopher clients.
- Cleaner implementation for retrieving FTP URLs via ftpget.pl.
- Tries to write cached.pid file in same directory as cached.conf.
- Changed FTP support to sacrifice correct HTTP headers for
dramatically decreased latency for large FTP objects.
- Fixed ftpget.pl -htmlify to determine directory vs. file
correctly and send HTTP header as soon as possible.
- Fixed rare core dump during HTTP xfers.
- Fixed how the error messages are printed.
- Better support for larger file descriptor tables.
- Debug level 0 and 1 now has timestamp logged.
- Cleaned and updated defaults for cached.conf.
- When run as root and do suid, cached will change current
directory
to its swap directory. Swap directory is pretty sure that
writable
to cached. Just in case, it crashes so it can write core file.
- Minor modification of store error message.
- Remote client connection resets are handled as soft error.
- Strip an extra /r/n from MIME.
- Hierachy log (yet another log, but it's optional).
- Periodically hunts for zombies processes.
- Added more information to the stat interface.
- Cleaned up info data for improved parsability/readability.
- Changes to the Gatherer:
- Added support to follow HTTP redirection pointers.
- Added support for $http_proxy environment variable in liburl.
- Added support for summarizing SGML data.
- Added better support for summarizing TeX data.
- Added support for summarizing RTF and MIF data, using Rainbow
software provided by EBT, which we make available in our new
components distribution
- Added support for summarizing WordPerfect 5.1 data.
- changed HTML summarizing to use SGML summarizer, providing more
easily customizable results
- Added support for local filesystem gatherering for NNTP.
- Improved incremental gatherering support, and integrated the
support into the Essence program (removed dbcheck program).
- Added support for "fake" MD5 generation per SOIF object on
external presentation unnesting streams (exploders) --
permits incremental gathering on data generated by an Exploder.
- Added --memory-efficient to Essence to trade time for memory
efficiency; this help users who have limited with memory
resources
but are dealing with large SOIF objects.
- Added --confirm-host to Essence for explicit host DNS validation.
- Added --max-refresh to Essence to limit refreshing activity.
- RootNode enumerators generate RFC 1738 escaped URLs.
- Improved performance of SOIF parsing.
- Fixed bug in locating gzip in gatherd.
- Fixed bug in the unnesting commands in Essence.
- Fixed bug with HTTP/1.0 requests, now sends encoded URIs for
GETs.
- Fixed ftp.pl for Solaris. Wasn't setting PF_INET correctly.
- Changes to the Replicator:
- Updated with USC's version from 3/15/95
- Changes to the User's Manual:
- Added sections for new plug'n'play components: standard,
SGML, HTML, MIF, RTF, WordPerfect 5.1.
- Updated support policy.
- Added clarification in Local Gatherering section.
- Added clarification in RootNode enumeration section.
- Added clarification on Gatherer/Broker information flow.
- Added clarification for some cached internals.
- Added section on upgrading from v1.1 to v1.2.
- Added discussion about httpd_accel for cached.
- Updated info about software for the replicator section.
- Updated numerous facts to v1.2.
- Reorganized essence/content extraction customization section.
- Added description of SGML summarizing and components distribution
(including Rainbow software for MIF and RTF formats)
- Added more troubleshooting comments to all sections.
- Added more detail to cache and replication sections, including
discussions of httpd-accelerator, CreateReplica, and some of the
performance and failure-mode characteristics of the cache.
- Cleared up inaccuracies and unclarities in Gatherer RootNode
specification section.
- Added notes about user-contributed software.
- Updated support policy.
- Added index entries for all programs in appendicies.
- Other minor changes.
- Miscellaneous changes:
- Reorganized the source tree to support plug'n'play components.
##############################################################################
Changes between release v1.1 (February 17, 1995) and v1.1.beta.v2:
- Changes to the Broker:
- Added a leading protocol version header for the result set.
- Added support for query flags during Broker-to-Broker
collections.
- Added support for limiting the lifetime of glimpse queries.
- Fixed major bugs in Broker-to-Broker collections.
- Fixed major bugs with deleting Registry entries during initial
build.
- Fixed memory leaks and file descriptor mgmt bugs in
glimpseserver.
- Fixed bug with -L in glimpseserver.
- Fixed bug that increased the size of structured glimpse indexes.
- Fixed bugs in the administrative interface and WAIS support.
- Fixed core dump when searching the Registry during collections.
- Fixed display SOIF links flag in BrokerQuery.pl.
- Fixed .cgi pgms, so that httpd kills the cleanly after user
abort.
- Changed glimpseserver and broker so that they will not block
longer than 15 seconds while waiting for an incoming connection.
This prevents SunOS from blindly swapping out the process.
- Optimized so that a full glimpseindex will only happen if more
than 10% of the objects have changed.
- Added some more logging output.
- Fixed various minor bugs.
- Changes to the Cache:
- Added Gopher->HTML support. For mosaic proxy, you'll need to
set gopher_proxy http://cache.server:3128/
instead of
set gopher_proxy gopher://cache.server:3128/
- Fixed bug with HTML-ify FTP directories using ftpget.pl.
- Fixed bug with hierachical problem for refreshing.
- Fixed bogus client error message.
- Improved cached error messages.
- Changes to the Gatherer:
- Generates the 'Description' attribute whenever possible.
- Fixed bug in the expiring of objects from the PRODUCTION
database.
- Fixed bug in httpenum that wasn't cleaning up correctly.
- Fixed newsenum to obey URL-Max limit.
- Improved the Mail summarizer.
- Improved the USENET support, added NewsArticle and NewsGroup.
- Improved gatherd to speed up SEND-UPDATE timestamp computation.
- Improved preparation for the Gatherer's database to be exported.
- Purify'd Essence to remove memory leaks.
- Changes to the User's Manual:
- Updated the section on the Broker's Collection.conf file.
- Updated many minor points.
- Improved HTML version of the manual, by upgrading latex2html pgm.
- Miscellaneous changes:
- Fixed problems with Solaris' socket.ph for Perl programs.
##############################################################################
Changes between release v1.1.beta.v2 (February 3, 1995) and v1.1.beta:
- Changes to the Broker:
- Major performance improvements while doing collections.
- Uses the customizable BrokerQuery.pl for the WWW interface.
- Fixed major bugs in Broker-to-Broker transfers.
- Fixed minor bug in collections that caused necessary indexing.
- Cleaned and improved the information that is logged to
broker.out.
- Changed broker to run cleanly as a daemon by disconnecting from
the controlling terminal.
- glimpseserver now prints its error messages correctly.
- Fixed various minor bugs.
- Changes to the Cache:
- Fixed core dump bug when cached is heavily loaded.
- Improved error messages.
- Changes to the Gatherer:
- Site enumeration filter is based on host:port, and better argv
processing for 'Gatherer' - fixes by "Albert Dvornik"
<bert@MIT.EDU>
- Major performance improvements while preparing databases.
- Fixed Gatherer to change to Top-Directory before running.
- Fixed Gatherer to write dummy index.html files in data/ and tmp/.
- Fixed bug in HTTP enumeration to only extract links from HTML.
- Fixed various minor bugs.
- Changes to the User's Manual:
- Added detailed appendix on Harvest software layout and programs.
- HTML version of the manual now contains the local copy of the
icons.
- Added section on customizing BrokerQuery.pl.
- Fixed example for Filters during RootNode enumeration.
- Added a search interface to the User's Manual using a Broker.
- Updated index.
- Miscellaneous changes:
- Improved log output format to be more readable.
- Added HP-UX port/fixes from Chris Dalton (crd@hplb.hpl.hp.com).
##############################################################################
Changes between release v1.1.beta (January 26, 1995) and v1.0:
- Changes to the Broker:
- Upgraded to Glimpse 2.1 which includes glimpseserver.
- Added faster, more memory-efficient internal Registry lookups.
- Added support for switching the indexing subsystem at run-time.
- Added a statistics generator for the Broker.
- Fixed BrokerQuery.cgi so that the rejection message from the
Broker while its doing indexing works all of the time.
- Fixed Broker bug that would cause the Broker to hang sometimes
on a pclose() after doing a collection with the gather command.
- Immediately denies outside connections during a collection,
indexing, or other administrative operations.
- Improved the HTML result set generated by BrokerQuery.
- Pointers to content summaries in the result set is now an option.
- Changed /brokers to /Harvest/brokers, etc.
- Limit the time that the Glimpse search engine runs for a query.
- Added Query.cgi which can be used to support Broker replicas.
- Added support for minimal bookkeeping from Gatherer.
- Fixed problems with the Broker's cleaning, added compress
Registry.
- Fixed problems with the Broker's updating of objects.
- Fixed BrokerQuery syntax error message to point to
queryhelp.html.
- Fixed BrokerRestart for Replicator interface.
- Fixed WWW interface to work with any document root.
- Fixed various minor bugs.
- Changes to the Cache:
- Fixed serious hierachical cache bug.
- New error messages. HTTP/1.0 compliant.
- Nuke If-Modified-Since to work with Netscape.
- Non-blocking DNS lookup using dnsserver program.
- New config parameter, cache_dns_program.
- Removed Tcl library binaries - have a precompiled version of
Harvest.
- Fixed stat for outgoing message.
- Use multiple directories for on-disk swap storage.
- Changes to the Gatherer:
- Added flexible support for specifying a Gatherer's workload.
- Added support for gatherering through the local file system.
- Added support for USENET URLs.
- Added INFO command to Gatherer for statistics.
- Added support for generating minimial bookkeeping attributes.
- Improved HTTP/1.0 support for MIME headers and Last-Modified
headers.
- Fixed bug with 'gather' that caused 'gunzip' decompression to
fail.
- Made automatic keyword generation, and local disk cache maximum
size a run-time flag.
- Added a SOIF parser in Perl.
- Changed HTML URL extractor from HTML.sum to separate program.
- Fixed Gopher support to have longer read timeout.
- Consolidated GDBM utilities into the 'gdbmutil' program.
- Fixed bug with gatherd leaving zombie children.
- Fixed various minor bugs.
- Changes to the Replicator:
- Replaced with USC's Replicator distribution.
- Changes to the User's Manual:
- Added a new subsection on Extended RootNode Specifications
- Added discussion about new Local-Mapping support
- Fixed various typos and clarified wording in various places
- Fixed some URLs, and added others
- Fixed the discussion on using Glimpse with the Broker.
- Added a new subsection the Perl SOIF library.
- Added more descriptions about various system components (e.g.,
HSR)
- Added more index entries, and clarified some of the existing
entries
- Added a note about realtime Gatherer updates
- Added mention of cache RAM requirements
- Added section on Support Policy and Harvest Team Contact
Information
- Updated copyright/licensing discussion
- Added a section about the binary-only distribution
- Changed section names and content at beginning to make it more
clear and to make more sense with the new installation.
- Reorganized manual by subsystem
- Added troubleshootings sections to each subsystem, and shifted
some stuff into there that had been in other places
- Expanded section on supported platforms and software needed for
running/building Harvest
- Clarified some parts of the ``Querying a Broker'' section
- Added appendix on Directory layout of installed Harvest software
- Updated to reflect new httpd reorg
- Updated default summarizer action list
- Noted that glimpseserver is now part of the system
- Added more discussion to replicator section, including a figure
- Miscellaneous changes:
- Reorganized Harvest's installed directory structure.
- Integrated port to AIX 3.2 and AIX cc by greving@dv.go.dlr.de.
- Integrated port to HP-UX A.09.03 by steff@csc.liv.ac.uk.
- Integrated port to IRIX 5.3 by leclerc@ai.sri.com.
- Integrated port to Linux 1.1.59 by hardy@cs.colorado.edu.
- Integrated port/fixes to HP-UX 09.03 and HP ANSI C compiler
A.09.69
by crd@hplb.hpl.hp.com.
- Changed all Perl scripts to work under Perl 4.x or 5.0.
- Try to use vfork rather than fork to save memory when possible.
- Updated Copyright.
##############################################################################
Changes between release v1.0 (November 7, 1994) and v1.0-beta-1.5:
- Changes to the Broker:
- Upgraded Glimpse from version 1.1 to 2.0.
- Added support for Glimpse 2.0 which allows byte-level indexing,
limiting result set sizes, arbitrary Boolean queries, and more.
- Made case insenstive and word matching the default for Glimpse.
- Improved and updated queryhelp.html and adminhelp.html.
- Added soifhelp.html to the help suite.
- Added a reboot-broker tag to the default broker Makefile.
- Fixed various minor bugs.
- Changes to the Gatherer:
- Better HTTP/1.0 support, sends User-Agent and From fields.
- Fixed a problem with cross-site Gopher RootNode enumeration.
- Fixed bug in HTTP RootNode enumeration.
- Generation of unique, sorted keyword list is optional in
config.h.
- Changed Gatherer program to work around Solaris 2.3 Perl 4.036
bug.
- Fixed various minor bugs.
- Changes to the Cache:
- Added support for the Netscape browser.
- No longer caches /cgi-bin/ URLs.
- Updated the Tcl/Tk/dpwish pointers for the Cache manager.
- Changes to the User's Manual:
- Added an index with over 300 entries.
- Added a new section about Querying a Broker.
- Added a new section about common SOIF attribute names.
- Added a new section on periodic gatherering.
- Added a new section on tuning Glimpse.
- Added a new section on the WWW interface to the Broker.
- Added a new section on integrating new search/indexing subsystems
into the Broker, and give detailed interface description.
- Added more detail to SOIF appendix.
- Improved and updated the Administrating a Broker subsection.
- Added more explanation about manual annotations.
- Folded in content from FAQ.
- Noted particular usefulness of the Essence-Options variable,
e.g., for setting --full-text.
- Added a note to the Customizing the candidate selection step
subsection that it's particularly useful to do section based on
file and URL naming heuristics when gathering remote data,
because it can avoid retrieving lots of data.
- Added a note in the subsection on Running a Gatherer that you can
set MAX_ENUM in src/common/include/config.h, and that a future
release of Harvest we will make it possible to set this limit
more
flexibly. Also noted about the robot guidelines.
- Added an overview about the lib and bin directories for the
Gatherer,
including the defaults and descriptions of each file.
- Showed RunGatherer and RunGatherd scripts and added discussion
of how to use them from cron and /etc/rc.local.
- Added pointer to FAQ on setting up HTTPD in the Broker section.
- Put the logo on the cover page.
- Miscellaneous changes:
- Updated the COPYRIGHT and added it to all appropriate source
files.
- Updated the FAQ, and converted to HTML.
- Fixed BSD compatability bug in src/install.sh.
##############################################################################
Changes between release v1.0-beta-1.5 (October 14, 1994) and v1.0-beta-1.4:
- Added a user manual that is intended to help both novice and advanced
Harvest users better use the system. It covers the following topics:
- Introduction to Harvest (1 page)
- Subsystem Overview (2 pages)
- Getting and Installing the Harvest software (1 page)
- Making Basic Use of Harvest (3 pages)
- Advanced Features of Harvest (5 pages)
- References (1 page)
- Appendix on The Summary Object Interchange Format (SOIF) (3
pages)
- Appendix on Essence Summarizer Actions (1 page)
- Appendix on Gatherer Examples (6 pages)
- Appendix on Broker's Query Manager and Collector Interface (2
pages)
- Changes to the Broker:
- Improved Broker installation, and added the CreateBroker program
that automatically creates and configures a Harvest Broker based
on a brief Question & Answer session with the user.
- Improved the Mosaic interface to be more user-friendly.
- Added support for duplicate removal based on MD5 values.
- Made Query Manager and Administrative interface more extensible.
- Rewrote the Broker registry to improve performance and
readability.
- Added the dumpregistry command to view the Broker's registry.
- Added the test-broker command for simple testing of a Broker.
- Added support for wais-8-b5, freeWAIS, commerical WAIS, and
Nebula.
- Cleaned up the admin.html and query.html files.
- Cleaned up much of the code to make more extensible.
- Fixed bug in the registry garbage collection.
- Fixed major memory leak bugs.
- Fixed various minor bugs.
- Changes to the Cache:
- Started using icp version_id 2 of the protocol.
- Improved support for OSF/1 v2.0 on 64-bit DEC Alphas.
- Added password support for administrative interface.
- Fix bug with FTP "Parent Directory", and cleaned up HTML for
dirs.
- Fixed various major bugs with hierarchial caching.
- Fixed various minor bugs.
- Changes to the Gatherer:
- Added support for generating a sorted, unique keyword attribute,
based on the Descripton, Partial-Text, or Keywords attribute.
- Added an "allow only these types" in the Candidate Selection
step.
- Added stub Exploder type to help users use the unnesting step.
- Gatherer automatically creates a gatherd.cf file if needed.
- Fixed major gatherd bug that caused.
- Fixed various minor bugs and memory leaks.
- Changes to the Replicator:
- Working on instrumenting the code to measure peformance.
- Fixed various bugs.
##############################################################################
ChangeLog,v 1.214 1996/02/01 06:35:49 duane Exp