Dec 18, 2015 Parallel BZIP2 v1.1.13 - by: Jeff Gilchrist <pbzip2@compression.ca> Available at: http://compression.ca/ This is the README for pbzip2, a parallel implementation of the bzip2 block-sorting file compressor. The output of this version should be fully compatible with bzip2 v1.0.2 or newer (ie: anything compressed with pbzip2 can be decompressed with bzip2). pbzip2 is distributed under a BSD-style license. For details, see the file COPYING. 1. HOW TO BUILD -- UNIX Type `make'. This builds the pbzip2 program and dynamically links to the libbzip2 library. You should ensure that you have at least libbzip2 1.0.5 or newer installed as it contains some important security bug fixes. If you do not have libbzip2 installed on your system, you should first go to http://www.bzip.org/ and install it. Debian users need the package "libbz2-dev". If you want to install a pre-built package on Debian, run the following command: 'apt-get update; apt-get install pbzip2' If you would like to build pbzip2 with a statically linked libbzip2 library, download the bzip2 source from the above site, compile it, and copy the libbz2.a and bzlib.h files into the pbzip2 source directory. Then type `make pbzip2-static'. Note: This software has been tested on Linux (Intel, Alpha), Solaris (Sparc), HP-UX, Irix (SGI), and Tru64/OSF1 (Alpha). 2. HOW TO BUILD -- Windows On Windows, pbzip2 can be compiled using Cygwin. If you do not have libbzip2 installed on your system, you should first go to http://www.bzip.org/ and install it. Cygwin can be found at: http://www.cygwin.com/ From a Cygwin shell, go to the directory where the pbzip2 source files are located and type `make'. This builds the pbzip2 program and dynamically links to the libbzip2 library. If you would like to build pbzip2 with a statically linked libbzip2 library, download the bzip2 source from the above site, compile it, and copy the libbz2.a file into the pbzip2 source directory. Then type `make pbzip2-static'. 3. DISCLAIMER I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE USE OF THIS PROGRAM, HOWSOEVER CAUSED. DO NOT COMPRESS ANY DATA WITH THIS PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER SMALL, THAT THE DATA WILL NOT BE RECOVERABLE. * Portions of this README were copied directly from the bzip2 README written by Julian Seward. 4. PBZIP2 DATA FORMAT You should be able to compress files larger than 4GB with pbzip2. Files that are compressed with pbzip2 are broken up into pieces and each individual piece is compressed. This is how pbzip2 runs faster on multiple CPUs since the pieces can be compressed simultaneously. The final .bz2 file may be slightly larger than if it was compressed with the regular bzip2 program due to this file splitting (usually less than 0.2% larger). Files that are compressed with pbzip2 will also gain considerable speedup when decompressed using pbzip2. Files that were compressed using bzip2 will not see speedup since bzip2 pacakages the data into a single chunk that cannot be split between processors. pbzip2 will still be able to decompress these files, but it will be slower than if the .bz2 file was created with pbzip2. A file compressed with bzip2 will contain one compressed stream of data that looks like this: [-----------------------------------------------------] Data compressed with pbzip2 is broken into multiple streams and each stream is bzip2 compressed looking like this: [-----|-----|-----|-----|-----|-----|-----|-----|-----] If you are writing software with libbzip2 to decompress data created with pbzip2, you must take into account that the data contains multiple bzip2 streams so you will encounter end-of-stream markers from libbzip2 after each stream and must look-ahead to see if there are any more streams to process before quitting. The bzip2 program itself will automatically handle this condition. 5. USAGE The pbzip2 program is a parallel version of bzip2 for use on shared memory machines. It provides near-linear speedup when used on true multi-processor machines and 5-10% speedup on Hyperthreaded machines. The output is fully compatible with the regular bzip2 data so any files created with pbzip2 can be uncompressed by bzip2 and vice-versa. The default settings for pbzip2 will work well in most cases. The only switch you will likely need to use is -d to decompress files and -p to set the # of processors for pbzip2 to use if autodetect is not supported on your system, or you want to use a specific # of CPUs. Note, that if you are using a large number of CPUs you may wish to lower your default stack size setting (with the -S switch or ulimit) to reduce the amount of memory each thread uses. Usage: pbzip2 [-1 .. -9] [-b#cdfhklm#p#qrS#tvVz] <filename> <filename2> <filenameN> Switches: -b# Where # is block size in 100k steps (default 9 = 900k) -c, --stdout Output to standard out (stdout) -d,--decompress Decompress file -f,--force Force, overwrite existing output file -h,--help Print this help message -k,--keep Keep input file, do not delete -l,--loadavg Load average determines max number processors to use -m# Where # is max memory usage in 1MB steps (default 100 = 100MB) -p# Where # is the number of processors (default: autodetect) -q,--quiet Quiet mode (default) -r,--read Read entire input file into RAM and split between processors -S# Child thread stack size in 1KB steps (default stack size if unspecified) -t,--test Test compressed file integrity -v,--verbose Verbose mode -V Display version info for pbzip2 then exit -z,--compress Compress file (default) -1,--fast ... -9,--best Set BWT block size to 100k .. 900k (default 900k). --ignore-trailing-garbage=# Ignore trailing garbage flag (1 - ignored; 0 - forbidden) Example: pbzip2 myfile.tar This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k. Example: pbzip2 -b15vk myfile.tar This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with a file block size of 1500k and a BWT block size of 900k. Verbose mode will be enabled so progress and other messages will be output to the display and the file myfile.tar will not be deleted after compression is finished. Example: pbzip2 -p4 -r -5 myfile.tar second*.txt This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use 4 processors with a BWT block size of 500k. The file block size will be the size of "myfile.tar" divided by 4 (# of processors) so that the data will be split evenly among each processor. This requires you have enough RAM for pbzip2 to read the entire file into memory for compression. pbzip2 will then use the same options to compress all other files that match the wildcard "second*.txt" in that directory. Example: pbzip2 -l myfile.tar This example will compress the file "myfile.tar" into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) if the 1 minute load average is less than 0.5, otherwise it will select the maximum # of processors so that only idle processors are used for pbzip2. If the system has 4 processors and the load average is 2.00, then pbzip2 will use 2 processors to compress the data. Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/ Example: tar -c directory_to_compress/ | pbzip2 -c > myfile.tar.bz2 These examples will compress the data being given to pbzip2 via pipe from TAR into the compressed file "myfile.tar.bz2". It will use the autodetected # of processors (or 2 processors if autodetect not supported) with the default file block size of 900k and default BWT block size of 900k. TAR is collecting all of the files from the "directory_to_compress/" directory and passing the data to pbzip2 as it works. Example: pbzip2 -d -m500 myfile.tar.bz2 This example will decompress the file "myfile.tar.bz2" into the decompressed file "myfile.tar". It will use the autodetected # of processors (or 2 processors if autodetect not supported). It will use a maximum of 500MB of memory for decompression. The switches -b, -r, -t, and -1..-9 are not valid for decompression.