1 / 68

Sequence Comparison

Sequence Comparison. Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment of two sequences. Multiple Sequence Alignment -Two or more sequences. Overview. Why compare sequences Homology vs. identity/similarity

strom
Download Presentation

Sequence Comparison

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment of two sequences Multiple Sequence Alignment -Two or more sequences

  2. Overview • Why compare sequences • Homology vs. identity/similarity • DotPlots • Scoring • Match • Mismatch • Gap penality • Global vs. local alignment • Do the results make biological sense?

  3. Why Align Sequences • Identify conserved sequences

  4. Why Align Sequences • Identify conserved sequences • Identify elements that repeat in a single sequence.

  5. Why Align Sequences • Identify conserved sequences • Identify elements that repeat in a single sequence. • Identify elements conserved between genes.

  6. Why Align Sequences • Identify conserved sequences • Identify elements that repeat in a single sequence. • Identify elements conserved between genes. • Identify elements conserved between species.

  7. Why Align Sequences • Identify conserved sequences • Identify elements that repeat in a single sequence. • Identify elements conserved between genes. • Identify elements conserved between species. • Regulatory elements

  8. Why Align Sequences • Identify conserved sequences • Identify elements that repeat in a single sequence. • Identify elements conserved between genes. • Identify elements conserved between species. • Regulatory elements • Functional elements

  9. Underlying Hypothesis?

  10. Underlying Hypothesis? EVOLUTION

  11. Underlying Hypothesis? EVOLUTION Based upon conservation of sequence during evolution we can infer function.

  12. Basic terms: • Similarity - measurable quantity. • Similarity- applied to proteins using concept of conservative substitutions • Identity • percentage • Homology-specific term indicating relationship by evolution

  13. Basic terms: • Orthologs: homologous sequences found in two or more species, that have the same function (i.e. alpha- hemoglobin).

  14. Basic terms: • Orthologs: homologous sequences found it two or more species, that have the same function (i.e. alpha- hemoglobin). • Paralogs: homologous sequences found in the same species that arose by gene duplication. ( alpha and beta hemoglobin).

  15. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position.

  16. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position. • Nucleic acids and proteins have polarity.

  17. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position. • Nucleic acids and proteins have polarity. • Typically only one direction makes biological sense.

  18. Pairwise comparison • Dotplot • All against all comparison. • Every position is compared with every other position. • Nucleic acids and proteins have polarity. • Typically only one direction makes biological sense. • 5’ to 3’ or amino terminus to carboxyl terminus.

  19. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity.

  20. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T

  21. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. . G A T C T G A T C T

  22. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. . . G A T C T G A T C T

  23. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. . . G A T C T . . G A T C T

  24. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. . . G A T C T . . G A T C T .

  25. DotPlot • Dotplot- matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. . . G A T C T . . G A T C T . . .

  26. Simple plot • Window: size of sequence block used for comparison. In previous example: • window = 1 • Stringency = Number of matches required to score positive. In previous example: • stringency = 1 (required exact match)

  27. Dot Plot • Compare two sequences in every register. • Vary size of window and stringency depending upon sequences being compared. • For nucleotide sequences typically start with window = 21; stringency = 14

  28. DotPlot WINDOW = 4; STRINGENCY = 2 GATCGTACCATGGAATCGTCCAGATCA GATC + (4/4) GATC - (0/4) GATC - (0/4) GATC + (2/4)

  29. This “match” from G and C out of the four

  30. Top 3 Rows

  31. Intragenic Comparison • Rat Groucho Gene

  32. Intergenic Comparison • Rat and Drosophila Groucho Gene

  33. Intergenic comparison • Nucleotide sequence contains three domains.

  34. Intergenic comparison • Nucleotide sequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register

  35. Intergenic comparison • Nucleotide sequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register • 450 - 1300 - Slightly weaker conservation

  36. Intergenic comparison • Nucleotide sequence contains three domains. • 50 - 350 - Strong conservation • Indel places comparison out of register • 450 - 1300 - Slightly weaker conservation • 1300 - 2400 - Strong conservation

  37. Groucho • These three coding regions correspond to apparent functional domains of the encoded protein

  38. Scoring Alignments • Quality Score: • Score x for match, -y for mismatch;

  39. Scoring Alignments • Quality Score: • Score x for match, -y for mismatch; • Penalty for: • Creating Gap • Extending a gap

  40. Scoring Alignments • Quality Score: • Quality = [10(match)]

  41. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)]

  42. Scoring Alignments • Quality Score: • Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps)

More Related