1 / 42

MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano

CS273A. Lecture 10: Comparative Genomics I. MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos Achlioptas. Announcements. HW2 is out Half way feedback end of this class. Please take 5 minutes to share your thoughts with us!.

amable
Download Presentation

MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS273A Lecture 10: Comparative Genomics I MW  12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: HarendraGuturu & PanosAchlioptas http://cs273a.stanford.edu [BejeranoFall13/14]

  2. Announcements • HW2 is out • Half way feedback end of this class. • Please take 5 minutes to share your thoughts with us! http://cs273a.stanford.edu [BejeranoFall13/14]

  3. TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAGTTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG Genome Evolution

  4. human chimp macaque mouse rat cow dog opossum platypus chicken zfish tetra fugu Comparative Genomics “Nothing in Biology Makes Sense Except in the Light of Evolution” Theodosius Dobzhansky human chimp macaque mouse rat cow dog opossum platypus chicken zfish tetra fugu t http://cs273a.stanford.edu [BejeranoFall13/14]

  5. human chimp macaque mouse rat cow dog opossum platypus chicken zfish tetra fugu Comparative Genomics “Nothing in Evolution Makes Sense Except in the Light of Computation” Yours Truly human chimp macaque mouse rat cow dog opossum platypus chicken zfish tetra fugu t http://cs273a.stanford.edu [BejeranoFall13/14]

  6. Evolution = Mutation + Selection Mistakes can happen during DNA replication. Mistakes are oblivious to DNA segment function. But then selection kicks in. junk functional ...ACGTACGACTGACTAGCATCGACTACGA... chicken TT CAT egg ...ACGTACGACTGACTAGCATCGACTACGA... many changes are not tolerated “anything goes” chicken This has bad implications – disease, and good implications – adaptation. http://cs273a.stanford.edu [BejeranoFall13/14]

  7. Mutation http://cs273a.stanford.edu [BejeranoFall13/14]

  8. Chromosomal (ie big) Mutations Five types exist: Deletion Inversion Duplication Translocation Nondisjunction

  9. Deletion Due to breakage A piece of a chromosome is lost

  10. Inversion Chromosome segment breaks off Segment flips around backwards Segment reattaches

  11. Duplication Occurs when a genomic region is repeated

  12. Whole Genome Duplication at the Base of the Vertebrate Tree Xen.Laevis WGD http://cs273a.stanford.edu [BejeranoFall13/14]

  13. Translocation Involves two chromosomes that aren’t homologous Part of one chromosome is transferred to another chromosomes

  14. Nondisjunction Failure of chromosomes to separate during meiosis Causes gamete to have too many or too few chromosomes Disorders: DownSyndrome – three 21st chromosomes Turner Syndrome – single X chromosome Klinefelter’s Syndrome – XXY chromosomes

  15. Genomic (ie small) Mutations • Six types exist: • Substitution (eg GT) • Deletion • Insertion • Inversion • Duplication • Translocation

  16. Example: Human-Chimp Genomic Differences Mutations kill functional elements. Mutations give rise to new functional elements(by duplicating existing ones, or creating new ones) Selection whittles this constant flow of genomic innovations. http://cs273a.stanford.edu [BejeranoFall13/14]

  17. Evolution = Mutation + Selection Time Negative Selection Neutral Drift Positive Selection http://cs273a.stanford.edu [BejeranoFall13/14]

  18. The Species Tree S S Sampled Genomes S Speciation Time When we compare one individual from two species, most, but not all mutations we see are fixed differences between the two species. http://cs273a.stanford.edu [BejeranoFall13/14]

  19. Inferring Genomic Histories From Alignments of Genomes http://cs273a.stanford.edu [BejeranoFall13/14]

  20. Gene tree Speciation Duplication Loss A Gene tree evolves with respect to a Species tree By “Gene” we meanany piece of DNA. Species tree

  21. Gene tree Speciation Duplication Loss Terminology Orthologs : Genes related via speciation (e.g. C,M,H3) Paralogs: Genes related through duplication (e.g. H1,H2,H3) Homologs: Genes that share a common origin (e.g. C,M,H1,H2,H3) single ancestral gene Species tree http://cs273a.stanford.edu [BejeranoFall13/14]

  22. Gene tree Speciation Duplication Loss Typical Molecular Distances If they were evolving at a constant rate: • To which is H1 closer in sequence, H2 or H3? • To which H is M closest? • And C? (Selection may skew distances) single ancestral gene Species tree http://cs273a.stanford.edu [BejeranoFall13/14]

  23. Gene tree Speciation Duplication Loss Gene trees and even species trees are figments of our (scientific) imagination Species trees and gene trees can be wrong. All we really have are extant observations, and fossils. Observed Inferred single ancestral gene Species tree http://cs273a.stanford.edu [BejeranoFall13/14]

  24. Gene Families

  25. What? • Compare whole genomes • Compare two genomes • Within (intra) species • Between (inter) species • Compare genome to itself • Compare functional element to a genome • Why? • To learn about genome evolution (and phenotype evolution!) • Homologous functional regions often have similar functions • Modification of functional regions can reveal • Neutral and functional regions • Disease susceptibility • Adaptation • And more.. • How? http://cs273a.stanford.edu [BejeranoFall13/14]

  26. Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x1x2...xM, y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence

  27. Scoring Function Alternative definition: minimal edit distance “Given two strings x, y, find minimum # of edits (insertions, deletions, mutations) to transform one string to the other” • Sequence edits: AGGCCTC • Mutations AGGACTC • Insertions AGGGCCTC • Deletions AGG . CTC Scoring Function: Match: +m Mismatch: -s Gap: -d Score F = (# matches)  m - (# mismatches)  s – (#gaps)  d Cost of edit operationsneeds to be biologicallyinspired (egDEL length). Solve via Dynamic Programming

  28. Are two sequences homologous? AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC DP matrix: -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Given an (optimal) alignment between two genome regions,you can ask what is the probability that they are (not) related by homology? Note that (when known) the answer is a function of the molecular distance between the two (eg, between two species)

  29. Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Similarity is often measured using “%id”, or percent identity %id = number of matching bases / number of alignment columns Where Every alignment column is a match / mismatch / indel base Where indel = insertion or deletion (requires an outgroup to resolve)

  30. Note the pattern of sequence conservation / divergence human Objective: find local alignment blocks, that are likely homologous (share common origin) O(mn) examine the full matrix using DP O(m+n) heuristics based on seeding + extension trades sensitivity for speed lizard http://cs273a.stanford.edu [BejeranoFall13/14]

  31. “Raw” (B)lastz track (no longer displayed) Alignment = homologous regions Protease Regulatory Subunit 3

  32. Chaining co-linear alignment blocks human Objective: find local alignment blocks, that are likely homologous (share common origin) Chaining strings together co-linear blocks in the target genome to which we are comparing. Double lines when there is unalignable sequence in the other species. Single lines when there isn’t. lizard http://cs273a.stanford.edu [BejeranoFall13/14]

  33. Gap Types: Single vs Double sided Human Sequence Mouse Sequence D E B’ D E In Human Browser In Mouse Browser Human sequence Mouse sequence D E Mouse homology Human homology D E D E

  34. Did Mouse insert or Human delete?The Need for an Outgroup Outgroup Sequence Human Sequence Mouse Sequence D E D E B’ D E In Human Browser In Mouse Browser Human sequence Mouse sequence D E Mouse homology Human homology D E D E

  35. Conservation Track Documentation http://cs273a.stanford.edu [BejeranoFall13/14]

  36. Dotplots • Dotplots are a simple way of seeing alignments • We really like to see good visual demonstrations, not just tables of numbers • It’s a grid: put one sequence along the top and the other down the side, and put a dot wherever they match. • You see the alignment as a diagonal • Note that DNA dotplots are messier because the alphabet has only 4 letters…

  37. Chaining Alignments Chaining highlights homologous regions between genomes, bridging the gulf between syntenic blocks and base-by-base alignments. Local alignments tend to break at transposon insertions, inversions, duplications, etc. Global alignments tend to force non-homologous bases to align. Chaining is a rigorous way of joining together local alignments into larger structures. http://cs273a.stanford.edu [BejeranoFall13/14]

  38. “Raw” (B)lastz track (no longer displayed) Alignment = homologous regions Protease Regulatory Subunit 3

  39. Chains & Nets: How they’re built • 1: Blastz one genome to another • Local alignment algorithm • Finds short blocks of similarity Hg18: AAAAAACCCCCAAAAA Mm8: AAAAAAGGGGG Hg18.1-6 + AAAAAA Mm8.1-6 + AAAAAA Hg18.7-11 + CCCCC Mm8.1-5 - CCCCC Hg18.12-16 + AAAAA Mm8.1-5 + AAAAA

  40. Chains & Nets: How they’re built • 2: “Chain” alignment blocks together • Links blocks that preserve order and orientation • Not single coverage in either species Hg18: AAAAAACCCCCAAAAA Mm8: AAAAAAGGGGGAAAAA • Hg18: AAAAAACCCCCAAAAA • Mm8 • chains Mm8.1-6 + Mm8.12-16 + Mm8.7-11 - Mm8.12-15 + Mm8.1-5 +

  41. Another Chain Example Human Sequence Mouse Sequence A B C A B C D E B’ D E In Human Browser In Mouse Browser Implicit Human sequence Implicit Mouse sequence … … D E … … Mouse chains Human chains D E D E B’

  42. Chains join together related local alignments likely ortholog likely paralogs shared domain? Protease Regulatory Subunit 3 http://cs273a.stanford.edu [BejeranoFall13/14]

More Related