packages icon



 FASTA/TFASTA/FASTX/TFASTXv3(1)               FASTA/TFASTA/FASTX/TFASTXv3(1)
                                    local



 NAME
      fasta3, fasta3_t - scan a protein or DNA sequence library for similar
      sequences

      tfasta3, tfasta3_t - compare a protein sequence to a DNA sequence
      library, translating the DNA sequence library `on-the-fly'.

      fastx3, fastx3_t  - compare a DNA sequence to a protein sequence
      database, comparing the translated DNA sequence in forward and reverse
      frames.

      tfastx3, tfastx3_t  - compare a protein sequence to a DNA sequence
      database, calculating similarities with frameshifts to the forward and
      reverse orientations.

      fasty3, fasty3_t  - compare a DNA sequence to a protein sequence
      database, comparing the translated DNA sequence in forward and reverse
      frames.

      tfasty3, tfasty3_t  - compare a protein sequence to a DNA sequence
      database, calculating similarities with frameshifts to the forward and
      reverse orientations.

      ssearch3, ssearch3_t - compare a protein or DNA sequence to a sequence
      database using the Smith-Waterman algorithm.


 DESCRIPTION
      Release 3.x of the FASTA package provides a modular set of sequence
      comparison programs that can run on conventional single processor
      computers or in parallel on multiprocessor computers. Seven different
      programs - fasta3, fastx3, fasty3, tfastx3, tfasty3, tfasta3, and
      ssearch3 - are currently available.

      All of the comparison programs share a set of basic command line
      options; additional options are available for individual comparison
      functions.

      The fasta3_t, fastx3_t, fasty3_t, tfasta3_t, tfastx3_t, tfasty3_t and
      ssearch3_t programs are threaded versions that will run in parallel on
      Digital Equipment, Sun, and SGI multiprocessor computers.


 Options for all comparison functions
      These versions of the fasta programs have been modified to accept a
      query sequence from the unix "stdin" data stream.  This makes it much
      easier to use fasta3 and its relatives as part of a WWW page. To
      indicate that stdin is to be used, use "-" or "@" as the query
      sequence file name.  "@" is prefered, because one can then specify a
      subset of the query sequence to be used, e.g:




                                    - 1 -         Formatted:  April 26, 2024






 FASTA/TFASTA/FASTX/TFASTXv3(1)               FASTA/TFASTA/FASTX/TFASTXv3(1)
                                    local



      cat query.aa | fasta3 -q @:50-150 s

      would search the 's' database with residues 50-150 of query.aa.  FASTA
      cannot automatically detect the sequence type (protein vs DNA) when
      "stdin" is used, so the '-n' option is required for DNA.

      -a # "SHOWALL" option attempts to align all of both sequences in FASTA
           and SSEARCH.

      -B   Do not show bit score, show z-score instead.

      -b # number of best scores to show (must be < -E cutoff)

      -d # number of best alignments to show ( must be < -e cutoff)

      -D   turn on debugging mode.  Enables checks on sequence alphabet that
           cause problems with tfastx3, tfasty3, tfasta3.

      -E # expectation value upper limit for score and alignment display.
           Defaults are 10.0 for FASTA3 and SSEARCH3 protein searches, 5.0
           for translated DNA/protein comparisons, and 2.0 for DNA/DNA
           searches.

      -F # expectation value lower limit for score and alignment display.
           -F 1e-6 prevents library sequences with E()-values lower than
           1e-6 from being displayed. This allows the use to focus on more
           distant relationships.

      -H   turn off histogram display

      -i   (DNA only) reverse complement the query sequence. (TFASTX)
           compare against only the reverse complement of the library
           sequence.

      -L   report long sequence description in alignments

      -m 0,1,2,3,4,5,6,9,10
           alignment display options

      -n   force query to nucleotide sequence

      -N # break long library sequences into blocks of # residues.  Useful
           for bacterial genomes, which have only one sequence entry.  -N
           2000 works well for well for bacterial genomes.

      -O file
           send output to file

      -q/-Q
           quiet option; do not prompt for input




                                    - 2 -         Formatted:  April 26, 2024






 FASTA/TFASTA/FASTX/TFASTXv3(1)               FASTA/TFASTA/FASTX/TFASTXv3(1)
                                    local



      -r "+n/-m"
           values for match/mismatch for DNA comparisons. +n is used for the
           maximum positive value and -m is used for the maximum negative
           value. Values between max and min, are rescaled, but residue
           pairs having the value -1 continue to be -1.

      -R file
           save all scores to statistics file (previously -r file)

      -s name
           specify substitution matrix.  BLOSUM50 is used by default;
           PAM250, PAM120, and BLOSUM62 can be specified by setting -s P120,
           P250, or BL62.  With this version, many more scoring matrices are
           available, including BLOSUM80 (BL80), and MDM10, MDM20, MDM40
           (Jones, Taylor, and Thornton, 1992 CABIOS 8:275-282; specified as
           -s M10, -s M20, -s M40). Alternatively, BLASTP1.4 format scoring
           matrix files can be specified.  BL80, BL62, and P120 are scaled
           in 1/2 bit units; all the other matrices use 1/3 bit units.  DNA
           scoring matrices can also be specified with the "-r" option.

      -S   treat lower case letters in the query or database as low
           complexity regions that are equivalent to 'X' during the initial
           database scan, but are treated as normal residues for the final
           alignment display.  Statistical estimates are based on the 'X'ed
           out sequence used during the initial search. Protein databases
           (and query sequences) can be generated in the appropriate format
           using John Wooton's "pseg" program, available from
           ftp://ncbi.nlm.nih.gov/pub/seg/pseg.  Once you have compiled the
           "pseg" program, use the command:

           pseg database.fasta -z 1 -q  > database.lc_seg

      -t # Translation table - tfasta3, fastx3, tfastx3, fasty3, and tfasty3
           now support the BLAST tranlation tables.  See
           http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c/.

      -T # (threaded, parallel only) number of threads or workers to use
           (set by default to 4 at compile time).

      -w # line width for similarity score, sequence alignment, output.

      -W # context width (residues outside the aligned region) for fasta3,
           ssearch3 alignments

      -x # score used for matches to 'X' or 'N', overriding the value
           specified in the scoring matrix.

      -X "#,#"
           offsets query, library sequence for numbering alignments





                                    - 3 -         Formatted:  April 26, 2024






 FASTA/TFASTA/FASTX/TFASTXv3(1)               FASTA/TFASTA/FASTX/TFASTXv3(1)
                                    local



      -z # Specify statistical calculation. Default is -z 1, which uses
           regression against the length of the library sequence. -z 0
           disables statistics.  -z 2 provides maximum likelihood estimates
           for lambda and K, censoring the 250 lowest and 250 highest
           scores. -z 3 uses Altschul and Gish's statistical estimates for
           specific protein BLOSUM scoring matrices and gap penalties. -z
           4,5: an alternate regression method.  -z 6 uses a composition
           based maximum likelihood estimate based on the method of Mott
           (1992) Bull. Math. Biol. 54:59-75.  -z 11,12,14,15,16: compute
           the regression against scores of randomly shuffled copies of the
           library sequences.  Twice as many comparisons are performed, but
           accurate estimates can be generated from databases of related
           sequences. -z 11 uses the -z 1 regression strategy, etc.

      -Z db_size
           Set the apparent database size used for expectation value
           calculations (used for protein/protein FASTA and SSEARCH, and for
           FASTX, FASTY, TFASTX, and TFASTY).

 Options for FASTA3,TFASTA3,FASTX3,FASTY3,TFASTX3,TFASTY3
      -1   Sort by "init1" score.

      -3   (TFASTA3, TFASTX/Y3 only) use only forward frame translations

      -A   force Smith-Waterman alignment for output.  Smith-Waterman is the
           default for protein sequences and FASTX3, but not for TFASTA3 or
           DNA comparisons with FASTA3.

      -c # threshold for band optimization

      -f # penalty for the first residue in a gap

      -g # penalty for additional residues in a gap

      -h # (FASTX3, TFASTX3, FASTY3, TFASTY3 only) penalty for a frameshift
           between two codons.

      -j # (FASTY3, TFASTY3 only) penalty for a frameshift within a codon.

      -y # Width for band optimization; by default 16 for DNA and protein
           ktup=2; 32 for protein ktup=1;

 SSEARCH3 command line options
      -f # penalty for first residue in a gap

      -g # penalty for additional residues in a gap

 Environment variables:
      FASTLIBS
           location of library choice file (-l FASTLIBS)




                                    - 4 -         Formatted:  April 26, 2024






 FASTA/TFASTA/FASTX/TFASTXv3(1)               FASTA/TFASTA/FASTX/TFASTXv3(1)
                                    local



      SMATRIX
           default scoring matrix (-s SMATRIX)

      SRCH_URL
           the format string used to define the option to re-search the
           database.

      REF_URL
           the format string used to define the option to lookup the library
           sequence in entrez, or some other database.


 AUTHOR
      Bill Pearson
      wrp@virginia.EDU







































                                    - 5 -         Formatted:  April 26, 2024