
September 15, 1991 Source code for the Basic Local Alignment Search Tool (BLAST) family of sequence database comparison programs, along with some support utilities and (new) awk scripts, is posted here. A previous major distribution is archived in its entirety beneath the "pub/blast.old" directory. Additional source code is necessary to compile and link the BLAST programs: the pre-release "ncbi", "gish", and "dfa" libraries. Source code for these libraries is located at the same level in the directory hierarchy as the "blast" distribution (currently in /pub/ncbi, /pub/gish, and /pub/dfa). The file blast.tar.Z is an L-Z compressed UNIX(R) tar archive containing all of the files splayed beneath the "explode" subdirectory. FTP this file to your local machine in binary mode, uncompress it, then untar it. The blast.tar file is the same file, just not compressed. VMS compress and tar utilities are posted on this machine in the toolbox/vms_util directory. For installation instructions, see the INSTALL file. Send bug reports or requests for electronic mail distribution to: Dr. Warren Gish, gish@ncbi.nlm.nih.gov or Dr. Stephen Altschul, altschul@ncbi.nlm.nih.gov National Center for Biotechnology Information National Library of Medicine Bldg. 38A Rm 8N-806 8600 Rockville Pike Bethesda, MD 20894-0001 (301) 496-2475 The people who played a role in bringing this fine software to you: Samuel Karlin, Dept. of Mathematics, Stanford Univ., Stanford, CA 94305 Stephen Altschul, NCBI, NLM, Bethesda, MD 20894 Webb Miller, Dept. of CS, Penn. State Univ., University Park, PA 16802 Gene Myers, Dept. of CS, Univ. of Arizona, Tuscon, AZ 85721 Warren Gish, NCBI, NLM, Bethesda, MD 20894 David Lipman, NCBI, NLM, Bethesda, MD 20894 Brief descriptions of the programs: blastp: compare an amino acid query sequence against a protein sequence database. blastn: compare a nucleotide query sequence against a nucleotide sequence database. blastx: compare a nucleotide query sequence translated in all 6 reading frames (3 on each strand) against a protein sequence database. tblastn: compare an amino acid query sequence against a nucleotide sequence database translated in all 6 reading frames. blast3: compare an amino acid query sequence against a protein sequence database to identify statistically significant 3-way sequence alignments (the query sequence plus two database sequences) in which the component pairwise alignments are statistically insignificant. setdb: produce a protein sequence database for use by blastp, blastx, and blast3 from a multi-sequence file in FASTA format. pressdb: produce a nucleotide sequence database for use by blastn and tblastn from a multi-sequence file in FASTA format. pam: generate a PAM matrix of any desired distance (from 2 to 511) and scale. pir2fasta: produce a file in FASTA format from one in NBRF PIR(R) format. gb2fasta: produce a file in FASTA format from one in GenBank(R) format. sp2fasta: produce a file in FASTA format from one in SWISS-PROT(R) format. memfile: manage the loading, updating, and dropping of files mapped into shared memory segments. Other files: blast.1: UNIX style manual page (using nroff's -man macros) describing blastp, blastn, blastx, and tblastn. blast.1ps: PostScript(R) version of blast.1 blast3.1: UNIX style manual page describing blast3. blast3.1ps: PostScript version of blast3.1 pir2fasta.nawk: a new awk script for converting NBRF PIR files into FASTA gb2fasta.nawk: a new awk script for converting GenBank files into FASTA dxch.aa: a sample protein sequence from the PIR in FASTA format Modification history: this information has been moved into the file named HISTORY.