packages icon
Dec 18, 2015

Parallel BZIP2 v1.1.13 - by: Jeff Gilchrist <pbzip2@compression.ca>
Available at:  http://compression.ca/

This is the README for pbzip2, a parallel implementation of the
bzip2 block-sorting file compressor.  The output of this version
should be fully compatible with bzip2 v1.0.2 or newer (ie: anything
compressed with pbzip2 can be decompressed with bzip2).

pbzip2 is distributed under a BSD-style license.  For details,
see the file COPYING.


1. HOW TO BUILD -- UNIX

Type `make'.  This builds the pbzip2 program and dynamically
links to the libbzip2 library.  You should ensure that you have
at least libbzip2 1.0.5 or newer installed as it contains some
important security bug fixes.

If you do not have libbzip2 installed on your system, you should
first go to http://www.bzip.org/ and install it.

Debian users need the package "libbz2-dev".  If you want to
install a pre-built package on Debian, run the following command:
'apt-get update; apt-get install pbzip2'

If you would like to build pbzip2 with a statically linked
libbzip2 library, download the bzip2 source from the above site,
compile it, and copy the libbz2.a and bzlib.h files into the
pbzip2 source directory.  Then type `make pbzip2-static'.

Note: This software has been tested on Linux (Intel, Alpha), 
Solaris (Sparc), HP-UX, Irix (SGI), and Tru64/OSF1 (Alpha).


2. HOW TO BUILD -- Windows

On Windows, pbzip2 can be compiled using Cygwin.

If you do not have libbzip2 installed on your system, you should
first go to http://www.bzip.org/ and install it.

Cygwin can be found at:  http://www.cygwin.com/
From a Cygwin shell, go to the directory where the pbzip2 source
files are located and type `make'.  This builds the pbzip2
program and dynamically links to the libbzip2 library.

If you would like to build pbzip2 with a statically linked
libbzip2 library, download the bzip2 source from the above site,
compile it, and copy the libbz2.a file into the pbzip2 source
directory.  Then type `make pbzip2-static'.


3. DISCLAIMER

   I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE
   USE OF THIS PROGRAM, HOWSOEVER CAUSED.

   DO NOT COMPRESS ANY DATA WITH THIS PROGRAM UNLESS YOU ARE
   PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER SMALL, THAT THE
   DATA WILL NOT BE RECOVERABLE.

* Portions of this README were copied directly from the bzip2 README
  written by Julian Seward.

  
4. PBZIP2 DATA FORMAT

You should be able to compress files larger than 4GB with pbzip2.

Files that are compressed with pbzip2 are broken up into pieces and
each individual piece is compressed.  This is how pbzip2 runs faster
on multiple CPUs since the pieces can be compressed simultaneously.
The final .bz2 file may be slightly larger than if it was compressed
with the regular bzip2 program due to this file splitting (usually
less than 0.2% larger).  Files that are compressed with pbzip2 will
also gain considerable speedup when decompressed using pbzip2.

Files that were compressed using bzip2 will not see speedup since
bzip2 pacakages the data into a single chunk that cannot be split
between processors.  pbzip2 will still be able to decompress these
files, but it will be slower than if the .bz2 file was created
with pbzip2.

A file compressed with bzip2 will contain one compressed stream of
data that looks like this:
[-----------------------------------------------------]

Data compressed with pbzip2 is broken into multiple streams and each
stream is bzip2 compressed looking like this:
[-----|-----|-----|-----|-----|-----|-----|-----|-----]

If you are writing software with libbzip2 to decompress data created
with pbzip2, you must take into account that the data contains multiple
bzip2 streams so you will encounter end-of-stream markers from libbzip2
after each stream and must look-ahead to see if there are any more
streams to process before quitting.  The bzip2 program itself will
automatically handle this condition.


5. USAGE

The pbzip2 program is a parallel version of bzip2 for use on shared
memory machines.  It provides near-linear speedup when used on true
multi-processor machines and 5-10% speedup on Hyperthreaded machines.
The output is fully compatible with the regular bzip2 data so any
files created with pbzip2 can be uncompressed by bzip2 and vice-versa.
The default settings for pbzip2 will work well in most cases.  The
only switch you will likely need to use is -d to decompress files and 
-p to set the # of processors for pbzip2 to use if autodetect is not
supported on your system, or you want to use a specific # of CPUs.
Note, that if you are using a large number of CPUs you may wish to
lower your default stack size setting (with the -S switch or ulimit)
to reduce the amount of memory each thread uses.

Usage:  pbzip2 [-1 .. -9] [-b#cdfhklm#p#qrS#tvVz] <filename> <filename2> <filenameN>

Switches:
 -b#    		  Where # is block size in 100k steps (default 9 = 900k)
 -c, --stdout	  Output to standard out (stdout)
 -d,--decompress  Decompress file
 -f,--force		  Force, overwrite existing output file
 -h,--help		  Print this help message
 -k,--keep		  Keep input file, do not delete
 -l,--loadavg 	  Load average determines max number processors to use
 -m#			  Where # is max memory usage in 1MB steps (default 100 = 100MB)
 -p#    		  Where # is the number of processors (default: autodetect)
 -q,--quiet		  Quiet mode (default)
 -r,--read		  Read entire input file into RAM and split between processors
 -S#              Child thread stack size in 1KB steps (default stack size if unspecified)
 -t,--test		  Test compressed file integrity
 -v,--verbose	  Verbose mode
 -V     		  Display version info for pbzip2 then exit
 -z,--compress	  Compress file (default)
 -1,--fast ... -9,--best 	Set BWT block size to 100k .. 900k (default 900k).
 --ignore-trailing-garbage=# Ignore trailing garbage flag (1 - ignored; 0 - forbidden)


Example:  pbzip2 myfile.tar

This example will compress the file "myfile.tar" into the compressed
file "myfile.tar.bz2".  It will use the autodetected # of processors
(or 2 processors if autodetect not supported) with the default file
block size of 900k and default BWT block size of 900k.


Example:  pbzip2 -b15vk myfile.tar

This example will compress the file "myfile.tar" into the compressed
file "myfile.tar.bz2".  It will use the autodetected # of processors
(or 2 processors if autodetect not supported) with a file block
size of 1500k and a BWT block size of 900k.  Verbose mode will be
enabled so progress and other messages will be output to the display
and the file myfile.tar will not be deleted after compression is 
finished.


Example:  pbzip2 -p4 -r -5 myfile.tar second*.txt

This example will compress the file "myfile.tar" into the compressed
file "myfile.tar.bz2".  It will use 4 processors with a BWT block
size of 500k.  The file block size will be the size of "myfile.tar"
divided by 4 (# of processors) so that the data will be split
evenly among each processor.  This requires you have enough RAM for
pbzip2 to read the entire file into memory for compression.  pbzip2
will then use the same options to compress all other files that
match the wildcard "second*.txt" in that directory.


Example:  pbzip2 -l myfile.tar

This example will compress the file "myfile.tar" into the compressed
file "myfile.tar.bz2".  It will use the autodetected # of processors
(or 2 processors if autodetect not supported) if the 1 minute load
average is less than 0.5, otherwise it will select the maximum # of
processors so that only idle processors are used for pbzip2.  If the
system has 4 processors and the load average is 2.00, then pbzip2
will use 2 processors to compress the data. 


Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example: tar -c directory_to_compress/ | pbzip2 -c > myfile.tar.bz2

These examples will compress the data being given to pbzip2 via pipe 
from TAR into the compressed file "myfile.tar.bz2".  It will use the
autodetected # of processors (or 2 processors if autodetect not
supported) with the default file block size of 900k and default BWT
block size of 900k.  TAR is collecting all of the files from the
"directory_to_compress/" directory and passing the data to pbzip2 as
it works.


Example:  pbzip2 -d -m500 myfile.tar.bz2

This example will decompress the file "myfile.tar.bz2" into the
decompressed file "myfile.tar".  It will use the autodetected # of
processors (or 2 processors if autodetect not supported).  It will use
a maximum of 500MB of memory for decompression.
The switches -b, -r, -t, and -1..-9 are not valid for decompression.