packages icon
 Changes in current version         - Added  support  for  setting  transfer
 timeout  times  through             envirnoment  variables, and config file
 settings (Peter Scott)
         -  Fixed  problem  with  broker  registry  not  correctly  handling
 deletion             of duplicate objects (Hrvoje Stipetic)         - Added
 support for relative URLs in client pull inforamtion, and           for the
 HTML refresh directive. (Hrvoje Stipetic)
         - Change the content of the "description" attribute to be the first
            255 characters of the objects "body" or "partial-text" attribute
           (Peter Valkenburg)
         - SGML summariser fixes -                 *  to  allow  mapping  of
 Dublin  Core  directives  to  attributes                  * to autmatically
 prefix meta generated  elements  with  a                     custom  prefix
                 *  to  'fix'  meta  generated  attributes  to  only contain
                   legal SOIF characters                  (Hrvoje  Stipetic)
         -  Added  support  for printing total number of items found when in
           perpage mode to nph-search (Hermann Straus)
         - Altered broker behaviour so that objects with no TTL value are
           never expired (Simon Wilkinson)
         - Fixed bug  in  gather  which  would  result  in  incorrect  HELLO
 messages
           being sent to servers (Dave Beckett)         - Fixed  broker  bug
 where duplicate copies of a URL could remain in
           the broker in certain circumstances (Hrvoje  Stipetic)          -
 Added  reporting  of  number  of  objects  added  to broker output
 (Hrvoje Stipetic)         - Added  broker  version  information  to  broker
 greeting  message             (Hrvoje  Stipetic)          - Fixed bug where
 admin password could be written to broker log            (Hrvoje  Stipetic)
         -  Removed  unnecessary  print's in the nph-search script         -
 Fixed parsing of broker configuration files so that whitespace is
 correctly handled in arguments (Peter J. Scott & Hrvoje Stipetic)         -
 Added -V option to  the  broker  to  return  the  version  number
 (Hrvoje  Stipetic)         - Fixed definition of random() in stor_man.c for
 better           portability (Otis Gospodnetic)         -  Improved  result
 ranking  in  nph-search for occurence of search           string in URL and
 titles (Bruce R. Lewis)          -  Fix  to  LocalMapping  code  to  remove
 problem  where  FD  0  (stdin)            could be closed by mistake (Simon
 Wilkinson)         - Fixes to SGML.sum to handle problems  with  HTTP-EQUIV
 and  spaces  in            META NAME attributes (Hrvoje Stipetic)         -
 Fix to broker where compression  would  unlink  open  files,  and
 eventually  the  broker  would  run  out  of  disk  space (Hrvoje Stipetic)
         - Fixes to sys_errlist definitions  for  Linux  compatibility  (Tim
 Riker)          -  Fixed  local file handling so that it doesn't hang if it
 tries to           access a  "special"  file  (Simon  Wilkinson)          -
 Fixes for glimpse internationalization - ISO_CHAR_SET has to
           be not only defined, but be  non-zero  (Marjan  Erzen)          -
 Added  status  code  logging  to  liburl,  so  failed requests are
 logged with  the  reason  for  failure  (Marjan  Erzen)          -  Altered
 robots.txt code, so that an empty User-Agent exists           every one may
 collect data (Marjan Erzen)          -  Fixed  gather.c  so  that  it  will
 compile  with  Sun's CC, and so           it doesn't redefine existing code
 (Marjan Erzen)         - Added a new admin command to the Broker -  'parse-
 template'             which  will  add  a SOIF stream to the broker (Marjan
 Erzen)         - Added a command line md5 tool (Marjan Erzen)
         - Altered Makefiles so that objects can be  built  outside  of  the
            current  source  directory when using a make that supports VPATH
           - such as gnumake. (Simon Wilkinson)         - Added support  for
 sending  Accept:  headers  to  http.c in liburl.            Currently sends
 Accept: */* (Simon Wilkinson)         - Fixed filter.c  so  both  host  and
 portnumber are made available for
           regular expressions (Wesley Alan Wright)         - Added a  SWISH
 based indexing engine, which is considerably more
           efficient than Glimpse, if slightly less fully featured
 (Simon Wilkinson)
         - Changes to nph-search.in so that the truncation  warnings  aren't
            erroneously  displayed when the user hasn't supplied a value for
           maxobjflag (Allyn Fratkin)
         - Added support for  META  client  pull  redirections  to  HTMLurls
           (Hrvoje Stipetic)
         - Increased limits in  HTML.decl  (Tim  Riker)          -  Fix  for
 gather command's uncompressed transfer routine so that it
           now transfers blocks  correctly  in  most  circumstances  (Marjan
 Erzen)            [This patch went in in pl10 - but the ChangeLog entry was
 omitted]         - Added N-at-a-time  output  to  the  nph-search  CGI,  so
 results  can             be  paged. (David Hoekman)         - Added noregex
 option to nph-search and to the broker Glimpse           interface to allow
 the  disabling  of  regular expressions in           search patterns (Simon
 Wilkinson)         - All Harvest binaries  should  now  honour  setting  of
 prefix  as  being           the install location, *when run normally*. Many
 still rely upon             HARVEST_HOME  to  point  them  to  the  correct
 location,  but  this is           provided by the support scripts which run
 them (Simon Wilkinson)         - Added support for setting defaults in  the
 query configuration file,           so defaults values can be given for all
 parameters that are usually           passed into  the  CGI  from  the  Web
 search  form  (Simon  Wilkinson)          - Added support for sorting query
 results by rank to nph-search           (Wesley Alan Wright)
         - Rewrote most of the Makefiles and altered the  configure  scripts
 so             that  the  build  process  is more rational, and so that the
 targets           are now "standardised". In addition a number of  problems
 with  the             makefiles  have  been fixed, and all of the configure
 routines use           one central cache  file  -  speeding  up  the  build
 process             (Simon  Wilkinson)          - Corrected order of adding
 files to PATH so that Harvest binaries are            now  always  used  in
 preference   to   system   ones  (Vincent  Winczewski)          -  Replaced
 BrokerAdmin.cgi with a perl script which accepts POST             requests.
 This  fixes  the potential problem of passwords being sent           in GET
 requests  which  were  visibile  in  servers  log  files             (Simon
 Wilkinson)         - Fixed bug in robots.txt code which caused segmentation
 faults when           reading incorrect files with empty  User-Agent  lines
 (Simon  Wilkinson)          - Updated nph-search so that it only sends nph-
 headers if invoked as             an  nph  script,  and  so  that  it  will
 automatically  decode  HTML           entities. Removed BrokerQuery.pl (now
 symlink to nph-search)           (Craig Counterman)         - Added  a  new
 perl HTML summariser (Andy Powell)
         - Replaced all of the HSR tools with new, fixed  versions  supplied
 by             Mic  Bowman          -  Added changes to SGML.sum to make it
 compatible with nsgmls, part           of  SP.  Error  reports  now  on  by
 default (Craig Counterman)         - Altered broker/Glimpse/index.c so that
 case and word matching are           disabled when an errorflag is provided
 (Craig Counterman)         - Fixed broker/Glimpse/index.c so it compiles on
 an SGI (Simon Woods)         - Altered http code so it  now  sends  a  user
 controllable  User-Agent            and Maintainer address which can be set
 in the config file           (Hrvoje Stipetic)         - Altered  httpenum-
 breadth  so  that  the  count  of  objects  retrieved           is accurate
 (Hrvoje Stipetic)         - Altered summarisers Makefile so that Pdf.sum is
 now  installed             by  default  (Simon  Wilkinson)          - Added
 -DAGREP_POINTER=1  to  glimpse  Makefiles  -  fixes  segmentation
 faults  with  picky mallocs (Dan Riley)         - Replaced HTMLparse with a
 newer version from  the  Mosaic  2.7b5             which  handles  comments
 correctly (Simon Wilkinson)
         - Changed Postscript.sum to a sh script, so stderr doesn't  corrupt
 the           generated SOIF (Simon Wilkinson)
         - Added some #includes for AIX compatibility (Simon Wilkinson)
         - Default url-filter  now  blocks  VRML  files  (Craig  Counterman)
         -  Perl  scripts  which  forced /usr/local/harvest changed
 (Craig Counterman)         - Altered rfc1738 unescaping so it can cope with
 badly  formed  URLs            (Simon Wilkinson)         - Assorted changes
 for compilation under BSDI  (Simon  Wilkinson)          -  Fixed  configure
 files so libraries are included in correct order
           for Solaris (Simon Wilkinson)         - Fixed  SGML.sum  so  that
 all  debug  output  is  sent  to  STDERR  so  it            doesn't corrupt
 generated data (Simon Woods)         -  Fixed  undef  of  SOIF  associative
 array  for  compatibilty  with           Perl 5.004 (Bill Corley)         -
 Fixed bug where a url wasn't closed in httpenum-breadth           (Wolfgang
 Klimt)          -  Made  broker  more  resilient  to  glimpse  bugs  (Simon
 Wilkinson)         - Fixed commenting mistake in glimpse Makefile  (Jeffrey
 Goldberg)
         - Changed broker config so  numbers  are  now  indexed  by  glimpse
         -  Added  missing  ftp  files ftp.pl and chat2.pl (Simon Wilkinson)
         - Fix for Year 2000 logging and file naming problem (logs
 should  now  be  produced  using  4  digits for the year) (Simon Wilkinson,
           suggested by Craig Counterman)         - Fixed gatherers so  they
 don't  revisit  pages  that they've previously           found to be broken
 (Simon Wilkinson, suggested by David A.  Nowitz)          -  Fixed  breadth
 first  httpenum  so it correctly handles META robots           control tags
 (Simon Wilkinson)         - Rewrote Troff.sum in  perl  (David  A.  Nowitz)
         -  Added  DOCTYPES  and tidied up HTML generally (Craig Counterman)
         - Fixed &times bug in &timestamp call (Craig Counterman)          -
 Fixed assorted bugs in nph-search (Craig Counterman)

 Changes to v1.5         - Altered some lseek() calls for BSD  compatibility
 (Martin Hamilton)         - Removed optimisation from glimpse configuration
 (Simon Wilkinson)         - Added -nocol option to  broker  (Dave  Beckett)
         -  Added  virtual host support using the "Host:" header in HTTP/1.1
           (Juha Laiho)         -  Added  the  HTML  3.2  ISOlat  definition
 (Simon  Wilkinson)          -  Added  support  for  META Robots tags (Simon
 Wilkinson)         - Altered broker / glimpse interface so that an  attempt
 to use regular
           expressions on word boundries  results  on  word  matching  being
 turned             off,  rather  than  an  error  message (Simon Wilkinson)
         - Fixed core dumps  in  httpenum  when  accessing  pages  requiring
            authentication  (Simon Wilkinson)         - Changed url_retreive
 behaviour so original caller gets  notified  of             Redirect.  This
 fixes a bug which made it possible to index a site           and completely
 ignore their robots.txt file (Simon Wilkinson)         - Added wildcards to
 Local-Mapping  (Bruce  R.  Lewis)         - Added support to the broker for
 terminating connections if the           client goes away (Bruce R.  Lewis)
         -  Added depth first gatherer for both HTTP and Gopher enumeration.
           (Peter Scott)         - Added hooks added to enum,  prepurls  and
 Gatherer script to support           user selection of search technique via
 the Search variable (Simon            Wilkinson)          -  Bug  in  depth
 first  enumerator where a URL first encountered at a           depth deeper
 than max depth isn't indexed when its seen at a lower           depth fixed
 (Peter  Scott)          -  Depth counts added to the depth first enumerator
 (Peter Scott)         - Robots.txt checks added to the depth first HTTP and
 Gopher  depth           first enumerators (Simon Wilkinson)         - Added
 code to grab URLs from HTML containing frames  and  client             side
 image  maps  (Dean  Marino)         - Altered breadth first HTTP and Gopher
 enumerators so that the           server containing the root page is marked
 as  visited.  (Julian  Field)         - Altered enumerators so URL count is
 increased for visited pages only           (Ed Knowles)          -  Altered
 HTMLurls so HREFS can now contain new lines (Peter Scott)         - Altered
 depth first enumerators  to  reduce  number  of  temporary  files
 generated  (Simon  Wilkinson)          -  Fixed enumerators to avoid buffer
 overflows with  long  URLs  (Simon             Wilkinson)          -  Added
 MAX_FILTERS  env  var  to URL filter code (Simon Wilkinson)         - Fixed
 url code to avoid unnecessary generation of symlinks  (Bruce  R.
 Lewis)          -  Added  support  for  HTTP/1.1 servers to url code (Simon
 Wilkinson)         - Fixed end of header detection bug in  url  code  (Paul
 Johnson)          -  Fixed temporary file closing bug in url code (Bruce R.
 Lewis)         - Altered Makefiles so realclean removes config.cache (Simon
 Wilkinson)         - Altered robots.txt handling so it can cope with errors
 in server           robots files. Also some performance  improvements.  (Ed
 Knowles)          -  Added support for user specified User-Agent strings to
 the           robots.txt code (Simon Wilkinson)         - Changed method of
 comparing User-Agent string against robots.txt           file - now correct
 behaviour as per  the  robots  specification             (Simon  Wilkinson)
         -  Fixed  bug  where robots.txt code left file pointer unclosed (Ed
           Knowles)         - Fixed bug  where  paths  were  compared  case-
 insensitively in the           robots.txt code. (Simon Wilkinson)         -
 Altered Gatherer code so it converts ~ and %7E to %7e in             Local-
 Mapping strings (Simon Wilkinson)         - Added Locale option to Gatherer
 (Simon Wilkinson)         - Added html32.dtd  -  the  official  Wilbur  DTD
         -  Fixed  usage  of  fork in ps2txt wrapper (Peter Scott)         -
 Fixed Harvest script so the RunGatherd scripts it creates include
 correct  PATH (Simon Wilkinson)         - Fixed Rainbow summariser so it rm
 -r's instead of unlinking         - Altered  files  so  they  will  compile
 under  NetBSD  (Martin  Hamilton)          -  Altered configure scripts for
 NetBSD support  (Dave  Beckett)          -  Upgraded  included  version  of
 glimpse  to  4.0         - Fixed broker so the port picked for glimpseindex
 is less than 30000           (Peter Scott)         - Added support for  NOT
 operator  to  broker glimpse interface code         - Fixed signal handling
 bug in glimpseserver (Simon Wilkinson)         - Added warning  for  object
 truncated  result sets to BrokerQuery.pl.cgi         - Added nph-search CGI
 for displaying results "as they come" (Bruce R.
            Lewis)          -  Altered  query  cgis  to  escape  quotes   in
 $html_query  (Simon  Wilkinson)          - Added explanation of the Log-Key
 directive to broker.conf (Dave           Beckett)         -  Fixed  bug  in
 the  broker  lex  file  so searches containing an apostrophe           will
 work          Changes to v1.4.pl2:

         -  Added  "robots.txt"  support  to   the   gatherer   enumeration.
         -  Added  "prefix  ="  to components/*/Makefile.          - Changed
 Gopher timeout to 120 seconds.          - Changed HTTP-Query byurl  pattern
 to    be    any    URL    with    a   question   mark.            -   Added
 HARVEST_NOT_VISITED_LOG env var to httpenum.          - Added  support  for
 Glimpse-based  broker  to  limit  the number of           matched lines per
 object.          -  Protect  single  quotes  in  Gatherer-Name.           -
 Updated  html-mcom.dtd          -  Changed  BrokerQuery.pl  to  not  use  a
 tmpfile, and to sort the           results  by  number  of  matched  lines.
         -  Fixed  HTTP  authentication  to  work  with Netscape server, and
           support encoding spaces  as  RFC1738  escapes.           -  Fixed
 gatherd  timeout  bug  (caused  by  eliminating  the DNS           mismatch
 warning).           -  Fixed  Carriage-Return  substitution   in   RTF.sum.
         -  Changed  SGML.sum  to  not do "word wrap" on very large strings.
         - Fix HTML-lax.sum to turn carraige returns to  spaces.           -
 Added  --body-text  option to HTML-lax.sum and in the comments           of
 HTML.sum.           -  Fixed  SGML.sum  to  NOT  rewrite  correct   DOCTYPE
 declarations.           -  Changed  our  log() function to be called Log().
         - Changed handling  of  depth  between  enum  programs.           -
 Changed broker connect timeout to happen only after some           data has
 been read.          - Added 'Access-Delay' to gatherer.cf.  Now adds  delay
 for             LeafNode  URLs.           - Fixed 'gather' bug when reading
 binary  data  in  non-compressed             mode.           -   Configure:
 upgraded  to  v2.7         - BrokerQyery.pl.cgi: protect special characters
 in a Broker name.          -  Essence:  Print  [L]  for  URLs  where  local
 mapping succeeds.          - Update the Users manual.

 Changes between release v1.4.pl1 (November 17, 1995) and v1.4

 - Changes to the Gatherer         - Fixed NULL BASE  URL  coredump  bug  in
 HTMLurls          -  Fixed Gatherer to make Top-Directory set Lib-Directory
 value also           (like the manual  says  it  does  in  section  4.6.1).
         -  Fixed  essence  and SGML.sum to look in multiple lib dirs.  Look
 first                 in    Lib-Directory    if    set,    otherwise     in
 $HARVEST_HOME/lib/gatherer.

 - Changes to the Broker         - Added  <sys/select.h>  for  compiling  on
 AIX.           - Changed BrokerQuery.pl.cgi to send the query to the broker
           before opening the tmpfile.  If  there  is  a  delay  in  opening
            the  tmpfile  the  broker query could time out.          - Fixed
 potential coredump in Log_rotate() due to large local array.

 ##############################################################################

 Changes between release v1.4 (November 10, 1995) and v1.3

 - Changes to the Gatherer:         - Added symbolic link loop detection  to
 httpenum.           - Added a GIF image summarizer (GIFImage.sum), requires
 netpbm.            The GIFImage type is still in the  Essence  stoplist  by
 default.           -  Added 'C' version of ftpget.          - Added ability
 to rewrite the SOIF template URL with  Essence             post-processing.
 Could  be  used  to gather file:// URLs and           have them exported as
 http:// URLs.          - Added the ability to specify a program to generate
 root/leaf  URLs.           -  Fixed  select()  timeouts to POSIX semantics.
         - Fixed SGML summarizer to give error if input is empty.          -
 Fixed  a  Makefile  to  actually build and install HTML-lax.sum.          -
 Fixed liburl problem  with  AFS.   Must  *copy*  files  into  the
 cache-liburl  directory.          - Fixed News gatherering: If 'newsget.pl'
 exits non-zero,           close the NNTP server  socket.           -  Fixed
 newsget.pl  with  a  major rewrite.          - Fixed 'fileenum' to use URLs
 and not always return file://hostname/.          - Fixed gatherd bug  where
 child  process would remove parent's           gatherd.pid file.          -
 Changed NewsArticle.sum TTL  to  7  days  by  default.           -  Changed
 Essence  unnesting  to  occur in individual directories.          - Removed
 confusing gatherd DNS mismatch warning message.

 - Changes to the Broker:         - Added #Restart-Index-Server  command  to
 broker  admin  command set.          - Added error logging and debugging in
 Glimpse inline query code.          -  Fixed  select()  timeouts  to  POSIX
 semantics.          - Fixed Glimpse minor malloc problems.          - Fixed
 the broker on Linux; needs unbuffered input from gather            process.
         -  Fixed  broker  query  language  bug for high-bit (international)
           characters.           -  Changed  Broker  to  allow  specifically
 setting  GlimpseServer_Port             again;  if  not set, port is chosen
 randomly.          - Changed  BrokerAdmin.cgi  to  use  unbuffered  output.
         -  Changed  Glimpse  macros  CLEANUP  and  RETURN  to be functions.
         - Changed broker admin/LOG to  log  FQDN  instead  of  IP  address.
         - Remove glimpse version ambiguities in Glimpse/index.c.          -
 Removed getpeername() call in the broker; get address from accept().

 - Changes to the Cache:         - The cache has been moved  to  a  separate
 distribution.

 - Miscellaneous Changes         -  Dont  link  with  -lmalloc  on  Solaris.
         - Fixed User Manual and FAQ inconsistencies.

 ##############################################################################

 Changes between release v1.3 (September 7, 1995) and v1.3.beta:

 - Changes to the Broker:
         - Added support for auto-validation from the HSR which
           includes a description.html file, RunUpdate program
           for each new Broker.

 - Changes to the Cache:
         - Added support to dynamically toggle  debug  level  via  USR1  and
 USR2.
         - Fixed dnsserver parsing numeric addresses.
         - Added patches for FreeBSD.
         - Changed source_ping to off by default.
         - Added optional code for 'local_ip' line in cached.conf.
           Addresses given as 'local_ip' will be retrieved directly,
           without sending any probe packets.
         -  Added   'TIMEOUT_DIRECT'   as   a   new   kind   of   entry   in
 cache_hierarchy.log.

 - Changes to the Gatherer:
         -  Added  LMT.gdbm  to  liburl  to  keep  last-modified-timestamps.
         -  Added  support  for  using  BASE  element  in  HTML enumeration.
         - Added support for HTML-3.0  DTD.           -  Added  support  for
 Netscape  DTD.           -  Added support for HotJava DTD.          - Added
 old HTML.sum as HTML-lax.sum.          - Added  MacBinHex  as  a  supported
 nested  type  in  essence.           - Changed gatherd to die when its data
 directory gets removed.
         - Fixed bug: repeated HTTP redirected URLs (with help from
           glenn@rockie.nsc.com)

 - Miscellaneous Changes
         - Incorporated fixes for FreeBSD port from ted@oz.plymouth.edu.
         -     Incorporated     fixes     for     Ultrix      port      from
 dsr@lns598.lns.cornell.edu.

 ##############################################################################
 Changes between release v1.3.beta (August 7, 1995) and v1.2:

 - Changes to the Broker:
         - Upgraded to Glimpse 3.0.
         - Improved and updated WAIS, Inc. support to use version 2.1.1.
         - Added support for Verity VDK as backend indexer/searcher.
         - Added support for GRASS GIS as spatial database.
         - Added support for PLS, Inc. PLWeb as backend indexer/searcher.
         - Added IP numbers for incoming requests to log information.
         - Added support for displaying individual SOIF attributes via WWW.
         - Added 'Uniqify' command to Broker; keeps most current object
           of duplicate URLs.
         - Added security and name lookup to BrokerQuery.pl.
         - Added support for Glimpse inline queries.
         - Added error message to report incorrect WWW installation.
         - Added some support for Internationalization in the Broker
         - Added support for automatic validation by HSR.
         - Removed need for 'gzip' in the Broker.
         - Changed BrokerQuery.pl to try multiple entries from Brokers.cf
         - Changed broker to read queries with a timeout.  Very long queries
           can get segmented by TCP.
         - Fixed bug with matching Description attributes.
         - Fixed bug with Glimpse regular expression detection.
         - Fixed bug in CreateBroker -- wrong default Gatherer port number.

 - Changes to the Cache:
         - Added persistent disk storage across cached reboots.
         - Added IP-based access control.
         - Added setting of the TTL based on URL regular expressions.
         - Added more  sophisticated  setting  of  the  TTL  based  on  HTTP
 headers.
         - Added more statistics information.
         - Added support for logging using the common httpd logfile format.
         - Added support for HEAD HTTP request method.
         - Added support for user-configurable periodic garbage collection.
         - Added support for user-configurable stoplist.
         - Added support for WAIS proxy'ing (from Edward Moy, Xerox PARC).
         - Added support for quick aborting when client drops connection,
           cached stops immediately.  Useful for slow network links.
         - Added high/low water marks for disk storage.
         - Added 'source_ping' to cached.conf.
         - Added 'dns_children' to cached.conf.
         - Added -z to force a cached to discard (zap) its disk storage.
         - Added logging of ftpget.pl failures (exit codes and signals).
         - Added Expires timestamp to cache log
         - Improved error messages for DNS name lookup failures.
         - Improved performance of LRU replacement policy.
         - Improved performance for generating statistics.
         - Increased listen(2) socket queue size to 50 or max of OS.
         - Removed all Tcl code.
         - Cleaned memory allocation and management.
         - Cleaned up and updated cached.conf.
         - Cleaned up debugging output.
         - Changed default low watermark to 60%.
         - Changed trace mail into cached.conf option.
         - Changed algorithm for time estimations using echo ports.
         - Changed dnsserver to try gethostbyname(3) again sometimes
         - Fixed bugs with URL intepretation.
         - Fixed bugs with internal IPcache memory management.
         - Fixed bug with DNS lookups on IP numbers.
         - Fixed bug with not finding 'dnsserver'.
         - Fixed bug with hard timeouts in select loop.
         - Fixed bug with some platforms needing strdup().
         - Fixed bug with ftpget.pl not including MIME content-type for
           unknown filename extensions.
         - Fixed bug with ftpget.pl not parsing ls output correctly
           (wasn't matching dashes in user/group names).
         - Fixed copyright messages in source code.
         - Fixed realloc() bug for concurrent object access.
         - Fixed bug when neighbors specified and dns_servers != 3.
         - Fixed bug with new hash tables when deleting from table as it is
           being traversed.
         - Fixed various minor bugs.

 - Changes to the Gatherer:
         - Added ability to pass enumerated URLs through an external
           filter program.  Allows very specific selection of URLs to
           further enumerate.
         - Added -background flag to the Gatherer; does export work in bg.
         - Added IP-based filtering (regular expressions) in host-filter
         - Added Post-processing of summaries to Essence
         - Added  'gather'  check  for  'gzip'  before  setting  compression
 option.
         - Added username/password support for HTTP retrievals
         -  Changed  gatherer  to  remove  cache-liburl  directory  after  a
 successful
           gather session.
         - Fixed bug: Infinite loops in 'enum' on Invalid URLs
         - Fixed bug: HTTP headers not parsed from slow servers
         - Improved URL parsing; support for username/password in FTP urls.

 - Miscellaneous Changes
         - Upgraded autoconf 'configure' scripts to v2.4.
         - liburl: better handling of relative URLs.
         - liburl retrieval programs  abort  very  large  transfers  (at  10
 Mbytes)
         - Fixed bug with subscribing to harvest-users mailing list.

 ##############################################################################
 Changes between release v1.2 (April 3, 1995) and v1.1:

 - Changes to the Broker:
         - Major performance improvements to the collector interface.
         - Added fast, efficient internal Gatherer ID management.
         - Added support for clients requesting attributes with #attribute.
         - Added support for log file rotation, and terse logging.
         - Added support for #operation in query manager interface.
         - Cleaned up the log file format.
         - Cleaned up the administrative interface.
         - Cleaned up the UNIX file system-based storage manager.
         - Fixed major bug with WAIS support.
         - Fixed file descriptor leaks in glimpseserver when the index
           contained files that had since been deleted.
         - Fixed bug with overflowing lines from glimpse.
         - Fixed bug with hostname initialization.
         - Fixed memory leak with the Description-Tag attribute matching.
         - Fixed various minor bugs.

 - Changes to the Cache:
         - Added httpd accelerator support.
         - Added IP number logging.
         - Added setuid() to a user when cached is run as root.
         - Added support for HTTP servers that die abruptly.
         - Added client_timeout which places a hard limit on the life
           of incoming connections on the ascii port, or on outgoing
           HTTP or Gopher clients.
         - Cleaner implementation for retrieving FTP URLs via ftpget.pl.
         - Tries to write cached.pid file in same directory as cached.conf.
         - Changed FTP support to sacrifice correct HTTP headers for
           dramatically decreased latency for large FTP objects.
         - Fixed ftpget.pl -htmlify to determine directory vs. file
           correctly and send HTTP header as soon as possible.
         - Fixed rare core dump during HTTP xfers.
         - Fixed how the error messages are printed.
         - Better support for larger file descriptor tables.
         - Debug level 0 and 1 now has timestamp logged.
         - Cleaned and updated defaults for cached.conf.
         - When run  as  root  and  do  suid,  cached  will  change  current
 directory
           to its swap  directory.   Swap  directory  is  pretty  sure  that
 writable
           to cached.  Just in case, it crashes so it can write core file.
         - Minor modification of store error message.
         - Remote client connection resets are handled as soft error.
         - Strip an extra /r/n from MIME.
         - Hierachy log (yet another log, but it's optional).
         - Periodically hunts for zombies processes.
         - Added more information to the stat interface.
         - Cleaned up info data for improved parsability/readability.

 - Changes to the Gatherer:
         - Added support to follow HTTP redirection pointers.
         - Added support for $http_proxy environment variable in liburl.
         - Added support for summarizing SGML data.
         - Added better support for summarizing TeX data.
         - Added support for summarizing RTF and MIF data, using Rainbow
           software provided by EBT, which we make available in our new
           components distribution
         - Added support for summarizing WordPerfect 5.1 data.
         - changed HTML summarizing to use SGML summarizer, providing more
           easily customizable results
         - Added support for local filesystem gatherering for NNTP.
         - Improved incremental gatherering support, and integrated the
           support into the Essence program (removed dbcheck program).
         - Added support for "fake" MD5 generation per SOIF object on
           external presentation unnesting streams (exploders) --
           permits incremental gathering on data generated by an Exploder.
         - Added --memory-efficient to Essence to trade time for memory
           efficiency;  this  help  users  who  have  limited  with   memory
 resources
           but are dealing with large SOIF objects.
         - Added --confirm-host to Essence for explicit host DNS validation.
         - Added --max-refresh to Essence to limit refreshing activity.
         - RootNode enumerators generate RFC 1738 escaped URLs.
         - Improved performance of SOIF parsing.
         - Fixed bug in locating gzip in gatherd.
         - Fixed bug in the unnesting commands in Essence.
         - Fixed bug with HTTP/1.0 requests,  now  sends  encoded  URIs  for
 GETs.
         - Fixed ftp.pl for Solaris.  Wasn't setting PF_INET correctly.

 - Changes to the Replicator:
         - Updated with USC's version from 3/15/95

 - Changes to the User's Manual:
         - Added sections for new plug'n'play components: standard,
           SGML, HTML, MIF, RTF, WordPerfect 5.1.
         - Updated support policy.
         - Added clarification in Local Gatherering section.
         - Added clarification in RootNode enumeration section.
         - Added clarification on Gatherer/Broker information flow.
         - Added clarification for some cached internals.
         - Added section on upgrading from v1.1 to v1.2.
         - Added discussion about httpd_accel for cached.
         - Updated info about software for the replicator section.
         - Updated numerous facts to v1.2.
         - Reorganized essence/content extraction customization section.
         - Added description of SGML summarizing and components distribution
           (including Rainbow software for MIF and RTF formats)
         - Added more troubleshooting comments to all sections.
         - Added more detail to cache and replication sections, including
           discussions of httpd-accelerator, CreateReplica, and some of the
           performance and failure-mode characteristics of the cache.
         - Cleared up inaccuracies and unclarities in Gatherer RootNode
           specification section.
         - Added notes about user-contributed software.
         - Updated support policy.
         - Added index entries for all programs in appendicies.
         - Other minor changes.

 - Miscellaneous changes:
         - Reorganized the source tree to support plug'n'play components.

 ##############################################################################
 Changes between release v1.1 (February 17, 1995) and v1.1.beta.v2:

 - Changes to the Broker:
         - Added a leading protocol version header for the result set.
         -  Added  support   for   query   flags   during   Broker-to-Broker
 collections.
         - Added support for limiting the lifetime of glimpse queries.
         - Fixed major bugs in Broker-to-Broker collections.
         - Fixed major bugs with deleting Registry  entries  during  initial
 build.
         -  Fixed  memory  leaks  and   file   descriptor   mgmt   bugs   in
 glimpseserver.
         - Fixed bug with -L in glimpseserver.
         - Fixed bug that increased the size of structured glimpse indexes.
         - Fixed bugs in the administrative interface and WAIS support.
         - Fixed core dump when searching the Registry during collections.
         - Fixed display SOIF links flag in BrokerQuery.pl.
         - Fixed .cgi pgms, so that  httpd  kills  the  cleanly  after  user
 abort.
         - Changed glimpseserver and broker so that they will not block
           longer than 15 seconds while waiting for an incoming connection.
           This prevents SunOS from blindly swapping out the process.
         - Optimized so that a full glimpseindex will only happen if more
           than 10% of the objects have changed.
         - Added some more logging output.
         - Fixed various minor bugs.

 - Changes to the Cache:
         - Added Gopher->HTML support. For mosaic proxy, you'll need to
                 set gopher_proxy http://cache.server:3128/
           instead of
                 set gopher_proxy gopher://cache.server:3128/
         - Fixed bug with HTML-ify FTP directories using ftpget.pl.
         - Fixed bug with hierachical problem for refreshing.
         - Fixed bogus client error message.
         - Improved cached error messages.

 - Changes to the Gatherer:
         - Generates the 'Description' attribute whenever possible.
         - Fixed  bug  in  the  expiring  of  objects  from  the  PRODUCTION
 database.
         - Fixed bug in httpenum that wasn't cleaning up correctly.
         - Fixed newsenum to obey URL-Max limit.
         - Improved the Mail summarizer.
         - Improved the USENET support, added NewsArticle and NewsGroup.
         - Improved gatherd to speed up SEND-UPDATE timestamp computation.
         - Improved preparation for the Gatherer's database to be exported.
         - Purify'd Essence to remove memory leaks.

 - Changes to the User's Manual:
         - Updated the section on the Broker's Collection.conf file.
         - Updated many minor points.
         - Improved HTML version of the manual, by upgrading latex2html pgm.

 - Miscellaneous changes:
         - Fixed problems with Solaris' socket.ph for Perl programs.

 ##############################################################################
 Changes between release v1.1.beta.v2 (February 3, 1995) and v1.1.beta:

 - Changes to the Broker:
         - Major performance improvements while doing collections.
         - Uses the customizable BrokerQuery.pl for the WWW interface.
         - Fixed major bugs in Broker-to-Broker transfers.
         - Fixed minor bug in collections that caused necessary indexing.
         -  Cleaned  and  improved  the  information  that  is   logged   to
 broker.out.
         - Changed broker to run cleanly as a daemon by disconnecting from
           the controlling terminal.
         - glimpseserver now prints its error messages correctly.
         - Fixed various minor bugs.

 - Changes to the Cache:
         - Fixed core dump bug when cached is heavily loaded.
         - Improved error messages.

 - Changes to the Gatherer:
         - Site enumeration filter is based on host:port, and better argv
           processing  for  'Gatherer'   -   fixes   by   "Albert   Dvornik"
 <bert@MIT.EDU>
         - Major performance improvements while preparing databases.
         - Fixed Gatherer to change to Top-Directory before running.
         - Fixed Gatherer to write dummy index.html files in data/ and tmp/.
         - Fixed bug in HTTP enumeration to only extract links from HTML.
         - Fixed various minor bugs.

 - Changes to the User's Manual:
         - Added detailed appendix on Harvest software layout and programs.
         - HTML version of the manual now contains the  local  copy  of  the
 icons.
         - Added section on customizing BrokerQuery.pl.
         - Fixed example for Filters during RootNode enumeration.
         - Added a search interface to the User's Manual using a Broker.
         - Updated index.

 - Miscellaneous changes:
         - Improved log output format to be more readable.
         - Added HP-UX port/fixes from Chris Dalton (crd@hplb.hpl.hp.com).

 ##############################################################################
 Changes between release v1.1.beta (January 26, 1995) and v1.0:

 - Changes to the Broker:
         - Upgraded to Glimpse 2.1 which includes glimpseserver.
         - Added faster, more memory-efficient internal Registry lookups.
         - Added support for switching the indexing subsystem at run-time.
         - Added a statistics generator for the Broker.
         - Fixed BrokerQuery.cgi so that the rejection message from the
           Broker while its doing indexing works all of the time.
         - Fixed Broker bug that would cause the Broker to hang sometimes
           on a pclose() after doing a collection with the gather command.
         - Immediately denies outside connections during a collection,
           indexing, or other administrative operations.
         - Improved the HTML result set generated by BrokerQuery.
         - Pointers to content summaries in the result set is now an option.
         - Changed /brokers to /Harvest/brokers, etc.
         - Limit the time that the Glimpse search engine runs for a query.
         - Added Query.cgi which can be used to support Broker replicas.
         - Added support for minimal bookkeeping from Gatherer.
         -  Fixed  problems  with  the  Broker's  cleaning,  added  compress
 Registry.
         - Fixed problems with the Broker's updating of objects.
         -  Fixed   BrokerQuery   syntax   error   message   to   point   to
 queryhelp.html.
         - Fixed BrokerRestart for Replicator interface.
         - Fixed WWW interface to work with any document root.
         - Fixed various minor bugs.

 - Changes to the Cache:
         - Fixed serious hierachical cache bug.
         - New error messages. HTTP/1.0 compliant.
         - Nuke If-Modified-Since to work with Netscape.
         - Non-blocking DNS lookup using dnsserver program.
         - New config parameter, cache_dns_program.
         - Removed Tcl library binaries -  have  a  precompiled  version  of
 Harvest.
         - Fixed stat for outgoing message.
         - Use multiple directories for on-disk swap storage.

 - Changes to the Gatherer:
         - Added flexible support for specifying a Gatherer's workload.
         - Added support for gatherering through the local file system.
         - Added support for USENET URLs.
         - Added INFO command to Gatherer for statistics.
         - Added support for generating minimial bookkeeping attributes.
         - Improved HTTP/1.0 support  for  MIME  headers  and  Last-Modified
 headers.
         - Fixed bug with 'gather' that  caused  'gunzip'  decompression  to
 fail.
         - Made automatic keyword generation, and local disk cache maximum
           size a run-time flag.
         - Added a SOIF parser in Perl.
         - Changed HTML URL extractor from HTML.sum to separate program.
         - Fixed Gopher support to have longer read timeout.
         - Consolidated GDBM utilities into the 'gdbmutil' program.
         - Fixed bug with gatherd leaving zombie children.
         - Fixed various minor bugs.

 - Changes to the Replicator:
         - Replaced with USC's Replicator distribution.

 - Changes to the User's Manual:
         - Added a new subsection on Extended RootNode Specifications
         - Added discussion about new Local-Mapping support
         - Fixed various typos and clarified wording in various places
         - Fixed some URLs, and added others
         - Fixed the discussion on using Glimpse with the Broker.
         - Added a new subsection the Perl SOIF library.
         - Added more descriptions about various  system  components  (e.g.,
 HSR)
         - Added more index entries, and  clarified  some  of  the  existing
 entries
         - Added a note about realtime Gatherer updates
         - Added mention of cache RAM requirements
         -  Added  section  on  Support  Policy  and  Harvest  Team  Contact
 Information
         - Updated copyright/licensing discussion
         - Added a section about the binary-only distribution
         - Changed section names and content at beginning to make it more
           clear and to make more sense with the new installation.
         - Reorganized manual by subsystem
         - Added troubleshootings sections to each subsystem, and shifted
           some stuff into there that had been in other places
         - Expanded section on supported platforms and software needed for
           running/building Harvest
         - Clarified some parts of the ``Querying a Broker'' section
         - Added appendix on Directory layout of installed Harvest software
         - Updated to reflect new httpd reorg
         - Updated default summarizer action list
         - Noted that glimpseserver is now part of the system
         - Added more discussion to replicator section, including a figure

 - Miscellaneous changes:
         - Reorganized Harvest's installed directory structure.
         - Integrated port to AIX 3.2 and AIX cc by greving@dv.go.dlr.de.
         - Integrated port to HP-UX A.09.03 by steff@csc.liv.ac.uk.
         - Integrated port to IRIX 5.3 by leclerc@ai.sri.com.
         - Integrated port to Linux 1.1.59 by hardy@cs.colorado.edu.
         - Integrated port/fixes to HP-UX  09.03  and  HP  ANSI  C  compiler
 A.09.69
           by crd@hplb.hpl.hp.com.
         - Changed all Perl scripts to work under Perl 4.x or 5.0.
         - Try to use vfork rather than fork to save memory when possible.
         - Updated Copyright.

 ##############################################################################
 Changes between release v1.0 (November 7, 1994) and v1.0-beta-1.5:

 - Changes to the Broker:
         - Upgraded Glimpse from version 1.1 to 2.0.
         - Added support for Glimpse 2.0 which allows byte-level indexing,
           limiting result set sizes, arbitrary Boolean queries, and more.
         - Made case insenstive and word matching the default for Glimpse.
         - Improved and updated queryhelp.html and adminhelp.html.
         - Added soifhelp.html to the help suite.
         - Added a reboot-broker tag to the default broker Makefile.
         - Fixed various minor bugs.

 - Changes to the Gatherer:
         - Better HTTP/1.0 support, sends User-Agent and From fields.
         - Fixed a problem with cross-site Gopher RootNode enumeration.
         - Fixed bug in HTTP RootNode enumeration.
         -  Generation  of  unique,  sorted  keyword  list  is  optional  in
 config.h.
         - Changed Gatherer program to work around Solaris  2.3  Perl  4.036
 bug.
         - Fixed various minor bugs.

 - Changes to the Cache:
         - Added support for the Netscape browser.
         - No longer caches /cgi-bin/ URLs.
         - Updated the Tcl/Tk/dpwish pointers for the Cache manager.

 - Changes to the User's Manual:
         - Added an index with over 300 entries.
         - Added a new section about Querying a Broker.
         - Added a new section about common SOIF attribute names.
         - Added a new section on periodic gatherering.
         - Added a new section on tuning Glimpse.
         - Added a new section on the WWW interface to the Broker.
         - Added a new section on integrating new search/indexing subsystems
           into the Broker, and give detailed interface description.
         - Added more detail to SOIF appendix.
         - Improved and updated the Administrating a Broker subsection.
         - Added more explanation about manual annotations.
         - Folded in content from FAQ.
         - Noted particular usefulness of the Essence-Options variable,
           e.g., for setting --full-text.
         - Added a note to the Customizing the candidate selection step
           subsection that it's particularly useful to do section based on
           file and URL naming heuristics when gathering remote data,
           because it can avoid retrieving lots of data.
         - Added a note in the subsection on Running a Gatherer that you can
           set MAX_ENUM in src/common/include/config.h, and that a future
           release of Harvest we will make it possible  to  set  this  limit
 more
           flexibly.  Also noted about the robot guidelines.
         - Added an overview about the  lib  and  bin  directories  for  the
 Gatherer,
           including the defaults and descriptions of each file.
         - Showed RunGatherer and RunGatherd scripts and added discussion
           of how to use them from cron and /etc/rc.local.
         - Added pointer to FAQ on setting up HTTPD in the Broker section.
         - Put the logo on the cover page.

 - Miscellaneous changes:
         - Updated the COPYRIGHT and added  it  to  all  appropriate  source
 files.
         - Updated the FAQ, and converted to HTML.
         - Fixed BSD compatability bug in src/install.sh.

 ##############################################################################
 Changes between release v1.0-beta-1.5 (October 14, 1994) and v1.0-beta-1.4:

 - Added a user manual that is intended to help both novice and advanced
   Harvest users better use the system.  It covers the following topics:

         - Introduction to Harvest (1 page)
         - Subsystem Overview (2 pages)
         - Getting and Installing the Harvest software (1 page)
         - Making Basic Use of Harvest (3 pages)
         - Advanced Features of Harvest (5 pages)
         - References (1 page)
         - Appendix on The  Summary  Object  Interchange  Format  (SOIF)  (3
 pages)
         - Appendix on Essence Summarizer Actions (1 page)
         - Appendix on Gatherer Examples (6 pages)
         - Appendix on Broker's Query Manager  and  Collector  Interface  (2
 pages)

 - Changes to the Broker:
         - Improved Broker installation, and added the CreateBroker program
           that automatically creates and configures a Harvest Broker based
           on a brief Question & Answer session with the user.
         - Improved the Mosaic interface to be more user-friendly.
         - Added support for duplicate removal based on MD5 values.
         - Made Query Manager and Administrative interface more extensible.
         -  Rewrote  the  Broker  registry  to   improve   performance   and
 readability.
         - Added the dumpregistry command to view the Broker's registry.
         - Added the test-broker command for simple testing of a Broker.
         - Added support  for  wais-8-b5,  freeWAIS,  commerical  WAIS,  and
 Nebula.
         - Cleaned up the admin.html and query.html files.
         - Cleaned up much of the code to make more extensible.
         - Fixed bug in the registry garbage collection.
         - Fixed major memory leak bugs.
         - Fixed various minor bugs.

 - Changes to the Cache:
         - Started using icp version_id 2 of the protocol.
         - Improved support for OSF/1 v2.0 on 64-bit DEC Alphas.
         - Added password support for administrative interface.
         - Fix bug with FTP "Parent Directory",  and  cleaned  up  HTML  for
 dirs.
         - Fixed various major bugs with hierarchial caching.
         - Fixed various minor bugs.

 - Changes to the Gatherer:
         - Added support for generating a sorted, unique keyword attribute,
           based on the Descripton, Partial-Text, or Keywords attribute.
         - Added an "allow only these  types"  in  the  Candidate  Selection
 step.
         - Added stub Exploder type to help users use the unnesting step.
         - Gatherer automatically creates a gatherd.cf file if needed.
         - Fixed major gatherd bug that caused.
         - Fixed various minor bugs and memory leaks.

 - Changes to the Replicator:
         - Working on instrumenting the code to measure peformance.
         - Fixed various bugs.

 ##############################################################################

 ChangeLog,v 1.214 1996/02/01 06:35:49 duane Exp