Wiley publishing 2007 all rights reserved
Download
1 / 22

- PowerPoint PPT Presentation


  • 213 Views
  • Updated On :

© Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences. Learning Objectives. Get the basics about dot plots Know how to interpret the most common patterns in a dot plot Use Dotlet Use Lalign to extract local alignments. Outline. Some reasons for comparing two sequences

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - malaya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Learning objectives
Learning Objectives

  • Get the basics about dot plots

  • Know how to interpret the most common patterns in a dot plot

  • Use Dotlet

  • Use Lalign to extract local alignments


Outline
Outline

  • Some reasons for comparing two sequences

  • Basic principles of dot-plot comparisons

  • Using Dotlet

  • Making local alignments with Lalign


Why compare two sequences
Why Compare Two Sequences?

  • Database searches are useful for finding homologues

  • Database searches don’t provide precise comparisons

  • More precise tools are needed to analyze the sequences in detail including

    • Dot plots for graphic analysis

    • Local or global alignments for residue/residue analysis

  • The alignment of two sequences is called a pairwise alignment



Some applications of pairwise alignments
Some Applications of Pairwise Alignments

  • Convince yourself two sequences are homologous

  • Identify a shared domain

  • Identify a duplicated region

  • Locate important features such as

    • Catalytic domains

    • Disulphide bridges

  • Compare a gene and its product


What is a dot plot
What Is a Dot Plot ?

  • A dot plot is a graphic representation of pairwise similarity

  • The simplicity of dot plots prevents artifacts

  • Ideal for looking for features that may come in different orders

  • Reveal complex patterns

  • Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain


Choosing your two sequences
Choosing Your Two Sequences

  • Making pairwise comparisons takes time

  • Use BLAST to rapidly select your sequences

    • More than 70% identity for DNA

    • More than 25% identity for proteins

  • If your sequences are too similar, comparing them yields no useful information


Self comparisons
Self-comparisons

  • Start comparing your sequence with itself

  • You can discover

    • Repeated domains

    • Motifs repeated many times (low complexity)

    • Mirror regions (palindromes) in nucleic acids


What can you analyze with a dot plot
What Can You Analyze with a Dot Plot ?

  • Any pair of sequences

    • DNA

    • Proteins

    • RNA

  • DNA with proteins

    • Dotlet is an appropriate tool

    • To compare full genomes, install the program locally

  • Sequences longer than 1000 symbols are hard to analyze online


Some typical dot plot comparisons
Some Typical Dot-plot Comparisons

  • Divergent sequences where only a segment is homologous

  • Long insertions and deletions

  • Tandem repeats

    • The square shape of the pattern is characteristic of these repeats


Using dotlet
Using Dotlet

  • Dotlet is one of the handiest tools for making dot plots

  • Dotlet is a Java applet

  • Open and download the applet at the following site:

    • www.isrec.isb-sib.ch/java/dotlet

  • Use Firefox or IE (if one doesn’t work, use the other)


Set dotlet parameters
Set Dotlet Parameters

  • Dotlet slides a window along each sequence

  • If the windows are more similar than the threshold, Dotlet prints a dot at their intersection

  • You can control the similarity threshold with the little window on the left

Window size

Window Size

Threshold

Threshold


The dotlet threshold
The Dotlet Threshold

  • Every dot has a score given by the window comparison

  • When the score is

    • Below threshold 1  black dot

    • Between thresholds 1 and 2  grey dot

    • Above threshold 2  white dot

  • The blue curve is the distribution of scores in the sequences

  • The peak  most common score,

    • Most common  less informative

Log curve


Getting your dot plot right
Getting Your Dot Plot Right

  • Window size and the stringency control the aspect of your dot plot

    • Very stringent = clean dot plot, little signal

    • Not stringent enough = noisy dot plot, too much signal

  • Play with the threshold until a usable signal appears


Which size for the window
Which Size for the Window?

  • Long window

    • Clean dot plots

    • Little sensitivity

  • Short window

    • Noisy dot plots

    • Very sensitive

  • The size of the window should be in the range of the elements you are looking for

    • Conserved domains: 50 amino acids

    • Transmembrane segments: 20 amino acids

  • Shorten the window to compare distantly related sequences


Looking at repeated domains with dotlet
Looking at Repeated Domains with Dotlet

  • The square shape is typical of tandem repeats

  • The repeats are not perfect because the sequences have diverged after their duplication


Comparing a gene and its product
Comparing a Gene and Its Product

  • Eukaryotic genes are transcribed into RNA

  • The RNA is then spliced to remove the introns’ sequences

  • It may be necessary to compare the gene and its product

  • Dotlet makes this comparative analysis easy


Aligning sequences
Aligning Sequences

  • Dotlet dot plots are a good way to provide an overview

  • Dot plots don’t provide residue/residue analysis

  • For this analysis you need an alignment

  • The most convenient tool for making precise local alignments is Lalign


Lalign and blast
Lalign and BLAST

  • Lalign is like a very precise BLAST

  • It works on only two sequences at a time

  • You must provide both sequences


Lalign output
Lalign Output

  • Lalign produces an output similar to the alignment section of BLAST

  • The E-value indicates the significance of each alignment

  • Low E-value  good alignment


Going farther
Going Farther

  • If you need to align coding DNA with a protein, try these sites:

    • www.tcoffee.org => protogene

    • coot.embl.de/pal2nal

  • If you need to align very large sequences, try this site:

    • www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

  • If you need a precise estimate of your alignment’s statistical significance, use PRSS

    • The program is available at fasta.bioch.virginia.edu