. six-frame translations of a nucleotide query sequence against

. BLAST

The Basic Local Alignment Search
Tool (BLAST) finds regions of local similarity between sequences. The program
compares nucleotide or protein sequences to sequence databases and calculates
the statistical significance of matches. BLAST can be used to infer functional
and evolutionary relationships between sequences as well as help identify members
of gene families.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Example

It can be used

• Looking for species.

• Looking for domains

• Looking at phylogeny

• Mapping DNA to a known chromosome

• Annotations

• Searching for homology

Working

BLAST identifies homologous
sequences using a heuristic method which initially finds short matches between
two sequences; thus, the method does not take the entire sequence space into
account. After initial match, BLAST attempts to start local alignments from
these initial matches. This also means that BLAST does not guarantee the optimal
alignment, thus some sequence hits may be missed. In order to find optimal
alignments, the Smith-Waterman algorithm should be

used . In the following, the BLAST
algorithm is described in more detail.

How to use BLAST:

The Advance BLAST page has many
parameters which you can adjust,

and the outcome of a BLAST search
will depend on the parameters you

used.

A)  types of BLAST programs

There are five different blast
programs

BLASTP compares an amino acid query sequence against a protein

sequence database;

BLASTN compares a nucleotide query sequence against a nucleotide

sequence database;

BLASTX compares the six-frame conceptual translation products of

a nucleotide query sequence (both
strands) against a protein

sequence database;

TBLASTN compares a protein query sequence against a nucleotide

sequence database dynamically
translated in all six reading frames

(both strands).

TBLASTX compares the six-frame translations of a nucleotide query

sequence against the six-frame
translations of a nucleotide

sequence database.

B) Subject Databases

There are many databases to use as
subject databases. One of the most

commonly used is nr database: collection
of “non-redundant”

sequences from GenBank and other
sequence databanks.

C) Sequence input

BLAST accept the sequence in FASTA
format or Accession Number (GI

number).

D) Parameters to adjust

2. FASTA

 

FASTA (pronounced fast-ay) is a
heuristic for finding significant matches between a query string q and a
database string d. FASTA’s general strategy is to find the most significant
diagonals in the dot-plot or dynamic programming matrix. The performance of the
algorithm is influenced by a word-size parameter k, usually 6 for DNA and 2 for
amino acids. The algorithm consists of four phases as follows

3. ClustalW

Multiple alignments of protein
sequences are important tools in studying sequences. The basic information they
provide is identification of conserved sequence regions. This is very useful in
designing experiments to test and modify the function of specific proteins, in predicting
the function and structure of proteins, and in identifying new members of
protein families. Sequences can be aligned across their entire length (global
alignment) or only in certain regions (local alignment). This is true for
pairwise and multiple alignments. Global alignments need to use gaps
(representing insertions/deletions) while local alignments can avoid them,
aligning regions between gaps. ClustalW2 is a fully automatic program for
global multiple alignment of

DNA and protein sequences. The
alignment is progressive and considers the sequence redundancy. Trees can also
be calculated from multiple alignments. The program has some adjustable
parameters with reasonable defaults. ClustalW is a general purpose global
multiple sequence alignment program for DNA or proteins. It produces biologically
meaningful multiple sequence alignments of divergent sequences. It calculates the
best match for the selected sequences, and lines them up so that the
identities, similarities and differences can be seen. Evolutionary
relationships can be seen via viewing Cladograms orPhylograms.

Multiple alignments of protein
sequences are important tools in studying sequences.

4. RASMOL

 

RasMol is a molecular graphics
program intended for the visualization of proteins, nucleic acids and small
molecules. The program is aimed at display, teaching and generation of
publication quality images. RasMol runs on wide range of architectures and
operating systems including

Microsoft Windows, Apple Macintosh,
UNIX and VMS systems. UNIX and VMS versions require an 8, 24 or 32 bit colour X
Windows display (X11R4 or later). The X Windows version of RasMol provides
optional support for a hardware dials box and accelerated shared memory communication
(via the XInput and MIT-SHM extensions) if available on the current X Server.
The program reads in a molecule coordinate file and interactively displays the
molecule on the screen in a variety of colour schemes and molecule
representations. Currently available representations include depth-cued
wireframes, ‘Dreiding’ sticks, spacefilling (CPK) spheres, ball and stick,
solid and strand biomolecular ribbons, atom labels and dot surfaces. Up to 5
molecules may be loaded and displayed at once. Any one or all of the molecules
may be rotated and translated. The X Windows version of RasMol provides
optional support for a hardware dials box and accelerated shared memory communication
(via the XInput and MIT-SHM extensions) if available on the current X Server.
The program reads in molecular coordinate files and interactively displays the
molecule on the screen in a variety of representations and colour schemes.
Supported input file formats include Protein Data Bank (PDB), Tripos
Associates’ Alchemy and Sybyl Mol2 formats, Molecular Design Limited’s (MDL)
Mol file format,

Minnesota Supercomputer Center’s
(MSC) XYZ (XMol) format, CHARMm format, CIF format and mmCIF format files. If
connectivity

information is not contained in the
file this is calculated automatically.