1 / 21

DNA Sequencing

Learn about DNA sequencing and how it is used to obtain the sequence of nucleotides of a species. Understand the representative of the species, polymorphism rates, and human genetic variations. Explore different sequencing strategies and the challenges involved.

Download Presentation

DNA Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA Sequencing

  2. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

  3. Which representative of the species? Which human? Answer one: Answer two: it doesn’t matter Polymorphism rate: number of letter changes between two different members of a species Humans: ~1/1,000 Other organisms have much higher polymorphism rates • Population size!

  4. Why humans are so similar Out of Africa N A small population that interbred reduced the genetic variation Out of Africa ~ 40,000 years ago Heterozygosity: H H = 4Nu/(1 + 4Nu) u ~ 10-8, N ~ 104  H ~ 410-4

  5. Human population migrations • Out of Africa, Replacement • “Grandma” of all humans (Eve) ~150,000yr • Ancestor of all mtDNA • “Grandpa” of all humans (Adam) ~100,000yr • Ancestor of all Y-chromosomes • Multiregional Evolution • Fossil records show a continuous change of morphological features • Proponents of the theory doubt mtDNA and other genetic evidence • New fossil records bury “multirigionalists” • Nice article in Economist on that http://www.economist.com/science/displaystory.cfm?story_id=9507453

  6. DNA Sequencing – Overview 1975 • Gel electrophoresis • Predominant, old technology by F. Sanger • Whole genome strategies • Physical mapping • Walking • Shotgun sequencing • Computational fragment assembly • The future—new sequencing technologies • Pyrosequencing, single molecule methods, … • Assembly techniques • Future variants of sequencing • Resequencing of humans • Microbial and environmental sequencing • Cancer genome sequencing 2015

  7. DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~900 letters at a time

  8. DNA Sequencing – vectors DNA Shake DNA fragments Known location (restriction site) Vector Circular genome (bacterium, plasmid) + =

  9. Different types of vectors

  10. DNA Sequencing – gel electrophoresis • Start at primer (restriction site) • Grow DNA chain • Include dideoxynucleoside (modified a, c, g, t) • Stops reaction at all possible points • Separate products with length, using gel electrophoresis

  11. Method to sequence longer regions genomic segment cut many times at random (Shotgun) Get one or two reads from each segment ~900 bp ~900 bp

  12. Reconstructing the Sequence (Fragment Assembly) reads Cover region with high redundancy Overlap & extend reads to reconstruct the original genomic region

  13. Definition of Coverage C Length of genomic segment: G Number of reads: N Length of each read: L Definition:Coverage C = N L / G How much coverage is enough? Lander-Waterman model: Prob[ not covered bp ] = e-C Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides

  14. Repeats Bacterial genomes: 5% Mammals: 50% Repeat types: • Low-Complexity DNA (e.g. ATATATATACATA…) • Microsatellite repeats (a1…ak)N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) • Transposons • SINE(Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 106 copies • LINE(Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies • LTRretroposons(Long Terminal Repeats (~700 bp) at each end) cousins of HIV • Gene Families genes duplicate & then diverge (paralogs) • Recent duplications ~100,000-long, very similar copies

  15. AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Sequencing and Fragment Assembly 3x109 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions

  16. What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads

  17. What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads

  18. What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads

  19. AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT A R B D R C Sequencing and Fragment Assembly 3x109 nucleotides ARB, CRD or ARD, CRB ?

  20. AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Sequencing and Fragment Assembly 3x109 nucleotides

  21. Strategies for whole-genome sequencing • Hierarchical – Clone-by-clone • Break genome into many long pieces • Map each long piece onto the genome • Sequence each piece with shotgun Example: Yeast, Worm, Human, Rat • Online version of (1) – Walking • Break genome into many long pieces • Start sequencing each piece with shotgun • Construct map as you go Example: Rice genome • Whole genome shotgun One large shotgun pass on the whole genome Example: Drosophila, Human (Celera), Neurospora, Mouse, Rat, Dog

More Related