. BLASTThe Basic Local Alignment SearchTool (BLAST) finds regions of local similarity between sequences. The programcompares nucleotide or protein sequences to sequence databases and calculatesthe statistical significance of matches. BLAST can be used to infer functionaland evolutionary relationships between sequences as well as help identify membersof gene families.Example It can be used • Looking for species. • Looking for domains• Looking at phylogeny• Mapping DNA to a known chromosome• Annotations• Searching for homology Working BLAST identifies homologoussequences using a heuristic method which initially finds short matches betweentwo sequences; thus, the method does not take the entire sequence space intoaccount. After initial match, BLAST attempts to start local alignments fromthese initial matches.
This also means that BLAST does not guarantee the optimalalignment, thus some sequence hits may be missed. In order to find optimalalignments, the Smith-Waterman algorithm should beused . In the following, the BLASTalgorithm is described in more detail.
How to use BLAST:The Advance BLAST page has manyparameters which you can adjust,and the outcome of a BLAST searchwill depend on the parameters youused.A) types of BLAST programsThere are five different blastprogramsBLASTP compares an amino acid query sequence against a proteinsequence database;BLASTN compares a nucleotide query sequence against a nucleotidesequence database;BLASTX compares the six-frame conceptual translation products ofa nucleotide query sequence (bothstrands) against a proteinsequence database;TBLASTN compares a protein query sequence against a nucleotide sequence database dynamicallytranslated in all six reading frames(both strands).TBLASTX compares the six-frame translations of a nucleotide querysequence against the six-frametranslations of a nucleotidesequence database.B) Subject DatabasesThere are many databases to use assubject databases. One of the mostcommonly used is nr database: collectionof “non-redundant”sequences from GenBank and othersequence databanks.C) Sequence inputBLAST accept the sequence in FASTAformat or Accession Number (GInumber).D) Parameters to adjust2. FASTA FASTA (pronounced fast-ay) is aheuristic for finding significant matches between a query string q and adatabase string d.
FASTA’s general strategy is to find the most significantdiagonals in the dot-plot or dynamic programming matrix. The performance of thealgorithm is influenced by a word-size parameter k, usually 6 for DNA and 2 foramino acids. The algorithm consists of four phases as follows3. ClustalWMultiple alignments of proteinsequences are important tools in studying sequences. The basic information theyprovide is identification of conserved sequence regions. This is very useful indesigning experiments to test and modify the function of specific proteins, in predictingthe function and structure of proteins, and in identifying new members ofprotein families. Sequences can be aligned across their entire length (globalalignment) or only in certain regions (local alignment).
This is true forpairwise and multiple alignments. Global alignments need to use gaps(representing insertions/deletions) while local alignments can avoid them,aligning regions between gaps. ClustalW2 is a fully automatic program forglobal multiple alignment ofDNA and protein sequences. Thealignment is progressive and considers the sequence redundancy. Trees can alsobe calculated from multiple alignments. The program has some adjustableparameters with reasonable defaults.
ClustalW is a general purpose globalmultiple sequence alignment program for DNA or proteins. It produces biologicallymeaningful multiple sequence alignments of divergent sequences. It calculates thebest match for the selected sequences, and lines them up so that theidentities, similarities and differences can be seen. Evolutionaryrelationships can be seen via viewing Cladograms orPhylograms.Multiple alignments of proteinsequences are important tools in studying sequences. 4.
RASMOL RasMol is a molecular graphicsprogram intended for the visualization of proteins, nucleic acids and smallmolecules. The program is aimed at display, teaching and generation ofpublication quality images. RasMol runs on wide range of architectures andoperating systems includingMicrosoft Windows, Apple Macintosh,UNIX and VMS systems.
UNIX and VMS versions require an 8, 24 or 32 bit colour XWindows display (X11R4 or later). The X Windows version of RasMol providesoptional support for a hardware dials box and accelerated shared memory communication(via the XInput and MIT-SHM extensions) if available on the current X Server.The program reads in a molecule coordinate file and interactively displays themolecule on the screen in a variety of colour schemes and moleculerepresentations. Currently available representations include depth-cuedwireframes, ‘Dreiding’ sticks, spacefilling (CPK) spheres, ball and stick,solid and strand biomolecular ribbons, atom labels and dot surfaces. Up to 5molecules may be loaded and displayed at once. Any one or all of the moleculesmay be rotated and translated. The X Windows version of RasMol providesoptional support for a hardware dials box and accelerated shared memory communication(via the XInput and MIT-SHM extensions) if available on the current X Server.
The program reads in molecular coordinate files and interactively displays themolecule on the screen in a variety of representations and colour schemes.Supported input file formats include Protein Data Bank (PDB), TriposAssociates’ Alchemy and Sybyl Mol2 formats, Molecular Design Limited’s (MDL)Mol file format,Minnesota Supercomputer Center’s(MSC) XYZ (XMol) format, CHARMm format, CIF format and mmCIF format files. Ifconnectivityinformation is not contained in thefile this is calculated automatically.