packages icon

The MG system is a suite of programs for compressing and indexing text
and images. Most of the functionality implemented in the suite is as
described in the book ``Managing Gigabytes:  Compressing and Indexing
Documents and Images'', I.H. Witten, A.  Moffat, and T.C. Bell; Van
Nostrand Reinhold, New York, 1994, ISBN 0-442-01863-0; US $54.95; call
1 (800) 544-0550 to order.

These features include:

-- text compression using a Huffman-coded semi-static word-based scheme
-- two-level context-based compression of bi-level images
-- FELICS lossless compression of gray-scale images
-- combined lossy/lossless compression for textual images
-- indexing algorithms for large volumes of text in limited main memory
-- index compression
-- a retrieval system that processes Boolean and ranked queries
-- an X windows interface to the retrieval system

As one example, a collection of 2 Gb of text (1,700,000 documents) can
be indexed (on a SPARC 10 Model 512) in about four hours and compressed
in a further four hours to make a database that in total occupies less
than 800 Mb, or 40% of the original size. This includes a full index to
every word and number in the original text. Boolean queries such as
``managing AND gigabytes'' run in a few seconds, and ranked queries of
30--50 terms are evaluated in 10--30 seconds.

Details of these methods and further performance results appear in the
MG book.

The MG system comes with ABSOLUTELY NO WARRANTY; for details see the

Instructions on how to build and install mg are in the file INSTALL.


For copyright reasons the stemmer used in this distribution of MG is
not the same as the one illustrated in Figure 3.8 on page 108 of the MG
book.  This means that the numbers generated by the command ``mgstat
alice'' will not match those numbers in Figure A.1 on page 394.
Another stemmer was initially written as a simple stopgap for version
1.0.  That stemmer has been replaced by a stemmer based on the Lovin's
stemming algorithm for mg-1.1.

The output format of ``mgstat'' has changed since Figure A.1 (page 394)
was prepared. The same information is displayed but formatted


The current version is mg-1.2, September 1995. The changes from earlier 
versions are listed in the file MODIFICATIONS. This can be accessed
with mg by building a database using ``mgbuild mods'' and can also be
accessed from the mg web page (see below). 

The mg-1.2 extensions include:

-- Source modifications for use of GNU's autoconf.

The mg-1.1 extensions include:

-- A new highlighting mode.
   The output mode ``hilite'' will highlight the query terms in the
   retrieved text documents. The variable ``hilite_style'' can be set
   to ``bold'' or ``underline''. It works best with the pager
   ``less''.  A .mgrc to use would include:
	.set pager less
	.set mode hilite
	.set hilite_style bold

-- A web site containing manual pages, documentation, and a
   mgquery demo page (utilising cgi scripts). 
   One of these pages ``about_mg.html'' is included in this

-- A revised mg_get script which uses a .mg_getrc file to map
   specific collection names to filter types.  (Modifications by Bruce
   McKenzie). See mg_get.1 for more details.

-- Code to perform merging of existing databases. This code
   was created by Shane Hudson and is documented in the mgmerge.README
   file found in the docs subdirectory.  This code is maintained by
   Shane Hudson (

-- Revised man pages, including some new entries (thanks to Nelson
   Beebe). See mg.1, mgintro.1, mgintro++.1.

-- A real (rather than toy) stemmer.


Please refer to "README.port".

The MG development is largely the result of research collaboration

        Tim C. Bell            <>
        Ian Witten             <>
        Alistair Moffat        <>
        Justin Zobel           <>

The bulk of the programming work has been carried out by:

        Stuart Inglis          <>
        Craig Nevill-Manning   <>
        Neil Sharman           <>
        Tim Shimmin            <>

In addition to these, the following people have contributed to the
development of the MG software:

        Lachlan Andrew         <>
        Gary Eddy              <>
        Hugh Emberson          <>
        Kerry Guise            <>
	Shane Hudson           <>
        Linh Huynh             <>
        Bohdan S. Majewski     <>
        Bruce McKenzie         <>

In addition to these, the following people have submitted bug reports
and suggestions/fixes:

        Rex Barzee             <>
        Tim A.H. Bell          <>
        Tim C. Bell            <>
        Nelson Beebe           <>
	Rodney Brown           <>
        Rok Sosic              <>
        Carl Staelin           <>

Development of the MG system was supported by the Australian Research
Council; the Universities of Melbourne, Waikato, Canterbury, and
Calgary; RMIT; and the Collaborative Information Technology Research
Institute (Melbourne).


Send bug reports to <> and <>.
Back-traces from gdb are always welcome but not mandatory :-)


A bibliography of MG related research work appears in the files and MG.Bibliography.bib on the ftp site and
is accessible through the Web page.