1 / 158

Alignment of large genomic sequences

Alignment of large genomic sequences. Fragment-based alignment approach ( DIALIGN ) useful for alignment of genomic sequences. Possible applications: Detection of regulatory elements Identification of pathogenic microorganisms Gene prediction. The DIALIGN approach.

brigid
Download Presentation

Alignment of large genomic sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alignment of large genomic sequences Fragment-based alignment approach (DIALIGN) useful for alignment of genomic sequences. Possible applications: • Detection of regulatory elements • Identification of pathogenic microorganisms • Gene prediction

  2. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  3. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  4. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  5. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  6. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  7. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  8. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  9. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

  10. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

  11. The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

  12. The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!

  13. The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa

  14. First step in sequence comparison: alignment S3 S1 S2

  15. For genomic sequences: Neither local nor global methods appropriate First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  16. Local method finds single best local similarity First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  17. Multiple application of local methods possible First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  18. First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’ • Multiple application of local methods possible

  19. Multiple application of local methods possible First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  20. Multiple application of local methods possible First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  21. Threshold has to be applied to filter alignments: reduced sensitivity! First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  22. Alternative approach: During evolution few large-scale re-arrangements -> relative order homologies conserved Search for chain of local homologies First step in sequence comparison: alignment

  23. Genomic alignment: chain of homologies First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  24. Genomic alignment: chain of homologies First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  25. Genomic alignment: chain of homologies First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  26. Genomic alignment: chain of homologies First step in sequence comparison: alignment S3 S1 S2 S1’ S2’ S3’

  27. Novel approaches for genomic alignment: WABA PipMaker MGA TBA Lagan Avid DIALIGN First step in sequence comparison: alignment

  28. Alignment of large genomic sequences Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)

  29. Alignment of large genomic sequences

  30. Objective function for DIALIGN: • Weight score for every possible fragment f based on P-value: P(f) = probability of finding a fragment “like f” by chance in random sequences with same length as input sequences w(f) = -log P(f) (“weight score” of f) ”like f” means: at least same # matches (DNA, RNA) or sum of similarity values (proteins)

  31. Objective function for DIALIGN: • Score of alignment: sum of weight scores of fragments – no gap penalty!

  32. Optimization problem for DIALIGN: • Find consistent collection of fragments with maximum total weight score!

  33. Alternative fragment weight scores for genomic sequences: • Calculate fragment scores at nucleotide level and at peptide level.

  34. catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg

  35. catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg

  36. catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg Standard score: Consider length, # matches, compute probability of random occurrence

  37. Translation option: catcatatcttatcttacgttaactcccccgt cagtgcgtgatagcccatatccgg

  38. Translation option: L S Y V catcatatc ttatct tac gtt aactcccccgt cagtgcgtg atagcc cat atc cgg I A H I DNA segments translated to peptide segments; fragment score based on peptide similarity: Calculate probability of finding a fragment of the same length with (at least) the same sum of BLOSUM values

  39. P-fragment (in both orientations) L S Y V catcatatc ttatct tac gtt aactcccccgt cagtgcgtg atagcc cat atc cgg I A H I N-fragment catcatatc ttatcttacgttaactcccccgtgct || | | | cagtgcgtg atagcccatatccg For each fragment fthree probability values calculated; Score of f based on smallest P value.

  40. Alternative fragment weight scores for genomic sequences: • Calculate fragment scores at nucleotide level and at peptide level.

  41. DIALIGN alignment of human and murine genomic sequences

  42. DIALIGN alignment of tomato and Thalianagenomic sequences

  43. Alignment of large genomic sequences Evaluation of signal detection methods: Apply method to data with known signals (correct answer is known!). E.g. experimentally verified genes for gene finding • TP = true positves = # signals correctly predicted (i.e. signal present) • FP = false positives = # signals predicted but wrong (i.e no signal present) • TN = true negative = # no signal predicted, no signal present • FN = false negative = # no signal predicted, signal present!

  44. Alignment of large genomic sequences Sn = Sensitivity = correctly predicted signals / present signals = TP / (TP + FN) Sp = Specificity = correctly predicted signals / predicted signals • = TP / (TP + FP)

  45. Alignment of large genomic sequences Comprehensive evaluation of signal prediction method: Method assigns score to predictions Apply threshold parameter High threshold -> high specificity (Sp), low sensitivity (Sn) Low threshold -> high sensitivity , low specificity ROC curve („receiver-operator curve“) Vary threshold parameter, plot Sn against Sp

  46. Performance of long-range alignment programs for exon discovery (human - mouse comparison)

  47. DIALIGN alignment of tomato and Thalianagenomic sequences

  48. AGenDA:Alignment-based Gene Detection Algorithm • Bridge small gaps between DIALIGN fragments -> cluster of fragments

  49. AGenDA:Alignment-based Gene Detection Algorithm • Bridge small gaps between DIALIGN fragments -> cluster of fragments • Search conserved splice sites and start/stop codons at cluster boundaries to Identify candidate exons

  50. AGenDA:Alignment-based Gene Detection Algorithm • Bridge small gaps between DIALIGN fragments -> cluster of fragments • Search conserved splice sites and start/stop codons at cluster boundaries to Identify candidate exons • Recursive algorithm finds biologically consistent chain of potential exons

More Related