1 / 61

DNA sequencing and genome architecture

DNA sequencing and genome architecture. Steps in Genetic Analysis. Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype

edwarde
Download Presentation

DNA sequencing and genome architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA sequencing and genome architecture

  2. Steps in Genetic Analysis • Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype • A second step is determining the sequence of the gene (or genes)

  3. Steps in Genetic Analysis • Subsequent steps involve…. • Understanding gene regulation • Understanding the context of the gene in the sequence of the whole genome • Analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways, how these pathways interact with the environment

  4. Genome sequence of a diploid plant (2n = 2x = 14) • 5,300,000,000 base pairs • 165% of human genome • Enough characters for 11,000 large novels • 60,000,000 base pairs of expressed sequence • ~ 1% of total sequence, like humans • 125 large novels

  5. Molecular tools for determining DNA (or RNA) sequence • Genomic DNA (or RNA) extraction • (RNA cDNA) • Manipulate DNA with restriction enzymes to reduce complexity and/or facilitate further manipulation • Manage and/or maintain DNA in vectors and/or libraries • Selecting DNA targets via amplification and/or hybridization • Determining nucleotide sequence of the targeted DNA

  6. Extracting DNA (orRNA) • Genomic DNA (or RNA):    • Leaf segments or target tissues • Key considerations are • Concentration • Purity • Fragment size

  7. mRNA to cDNA Reverse transcriptase

  8. Restriction Enzymes • Restriction enzymes make cuts at defined recognition sites in DNA • A defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages • The restriction enzymes are named for the organism from which they were isolated   • Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assays • Each enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence

  9. Restriction Enzymes • Recognition sites and fragment size: a four-base cutter ~ 256 bp (44); more frequently than a six-base cutter, which in turn will cut more often than one with an eight base-cutter • Methylation sensitive restriction enzymes • Target the epigenome

  10. Restriction Enzymes • Palindrome recognition sites – the same sequence is specified when each strand of the double helix is read in the opposite direction. • Sit on a potato pan, Otis • Cigar? Toss it in a can, it is so tragic • UFO. tofu • Golf? No sir, prefer prison flog • Flee to me remote elf • Gnu dung • Lager, Sir, is regal • Tuna nut • CRISPR: Clustered regularly interspaced palindromic repeats

  11. Vectors • Propagate and maintain DNA fragments generated by the restriction digestion • Efficiency and simplicity of inserting and retrieving the inserted DNA fragments • Key feature of the cloning vector: size of the DNA insert • Plasmid ~ 1 kb • BAC ~ 200 kb

  12. Libraries Repositories of DNA fragments cloned in their vectors or attached to platform-specific oligonucleotide adapters Classified in terms of • cloning vector: e.g. plasmid, BAC  • in terms of cloned DNA fragment source: e.g. genomic, cDNA • In terms of intended use: e.g. next generation sequencing (NGS)

  13. Genomic DNA libraries • Total genomic DNA digested and the fragments cloned into an appropriate vector or system • Representative sample all the genomic DNA present in the organism, including both coding and non-coding sequences • Enrichment strategies: target specific types of sequences • unique sequences   • Methylated sequences

  14. cDNA libraries • Generated from mRNA transcripts, using reverse transcriptase • The cDNA library represents only the genes that are expressed in the tissue and/or developmental stage that was sampled  

  15. DNA Amplification: Polymerase Chain Reaction (PCR) • K.B Mullis, 1983 • in vitro amplification of ANY DNA sequence https://www.youtube.com/watch?v=2KoLnIwoZKU

  16. Synthetic DNA: oligonucleotides • Primers, adapters, and more …~$0.010 per bp... • < ~ 100 bases

  17. DNA Amplification: Polymerase Chain Reaction (PCR) Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.

  18. DNA Amplification - PCR A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.

  19. PCR Principles • The PCR reaction consists of: • Buffer • DNA polymerase (thermostable) • Deoxyribonucleotide triphosphates (dNTPs) • Two primers (oligonucleotides) • Template DNA

  20. PCR Principles • Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.

  21. PCR Principles • The choice of what DNA will be amplified by the polymerase is determined by the primers • The DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templates • Steps in the reaction include denaturing the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension

  22. PCR Applications • Amplify a target sequence from a pool of DNA (your favorite gene, forensics, fossil DNA) • Start the process of genome sequencing • Generate abundant markers for linkage map construction molecular markers

  23. DNA Hybridization • Single strand nucleic acids find and pair with other single strand nucleic acids with a complementary sequence • An application of this affinity is to label one single strand and then to use this probe to find complementary sequences in a population of single stranded nucleic acids • For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample  • The microarray concept: • https://www.dnalc.org/resources/3d/26-microarray.html

  24. DNA Hybridization The principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody Southern blot Northern blot Western blot

  25. Sanger DNA Sequencing - classic but not obsolete The gold standard for accurate sequencing of short (~ 650 bp) DNA sequences https://www.dnalc.org/view/15479-Sanger-method-of-DNA-sequencing-3D-animation-with-narration.html

  26. Next Generation Sequencing - Illumina A tremendous amount of short read data in a short time, at a good price https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8

  27. Sequencing - PAC Bio Long reads! https://www.youtube.com/watch?v=v8p4ph2MAvI

  28. Sequencing - Nanopore Potentially, cheap, fast, portable

  29. Sequencing considerations Read length Accuracy Speed Cost Assembly

  30. Genome sizes and whole genome sequencing Credit: Karl Kristensen, Denmark

  31. Sequencing a plant genome

  32. Fragariavesca Sequencing a plant genome • Herbaceous, perennial • 2n=2x=14 • 240 Mb • Reference species for Rosaceae • Genetic resources Credit: commons.Wikimedia.org Fragaria x ananassa: 2n=8x=56. Domesticated 250 years ago

  33. Sequencing a plant genome • Short reads • No physical reference • De novo assembly • Open source

  34. Sequencing a plant genome • Roche 454, Illumina • X39 coverage (number of reads including a given nucleotide) • Contigs (overlapping reads) assembled into scaffolds (contigs + gaps) • ~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length) • Over 95% (209.8 Mb) of total sequence is represented in 272 scaffolds

  35. Anchoring the genome sequence to the genetic map Sequencing a plant genome • 94% of scaffolds anchored to the diploid Fragariareference linkage map using 390 genetic markers • Pseudochromosomes ~ linkage groups

  36. Sequencing a plant genome Synteny Homologs Orthologs Paralogs Credit: Biology stackexchange.com

  37. Sequencing a plant genome The small genome size (240 Mb) • Absence of large genome duplications • Limited numbers of transposable elements, compared to other angiosperms

  38. Sequencing a plant genome the transcriptome • Fruits and roots – different types of genes

  39. Sequencing a plant genome Gene prediction • 34,809 nuclear genes • flavor, nutritional value, and flowering time • 1,616 transcription factors RNA genes • 569 tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs, • 76 micro RNA and 24 other RNAs Chloroplast genome • 155,691 bp encodes 78 proteins, 30 tRNAs and 4 rRNA genes • Evidence of DNA transfer from plastid genome to the nuclear genome

  40. Genome architecture and evolution • Key considerations: • Genes • Chromosomes • C value paradox • Gene regulation • Epigenetics • Transposable elements

  41. Genome architecture and evolution S. Wessler on Transposable elements: https://www.youtube.com/watch?v=_cJfsWYR42M Review of retroviruses: https://www.youtube.com/watch?v=OxwKuFZbDzc Review of cut and past transposition: https://www.youtube.com/watch?v=XYZHMGUGq6o

  42. Genes DNA ……..……..mRNA……….…….Protein Transcription tRNA rRNA Translation Other RNAs?

  43. Genes (classically..) DNA specifying a protein 200 – 2,000,000 nt (bp) promoter Coding region • Exon • Intron • Exon • Intron Exon • Start • codon • Stop • codon • 3’UTR • 5’UTR • Basal • promoter • +1 • Termination • signal • ORF • mRNA • CDS

  44. Chromosomes F. vesca: 35,000 genes/7 chromosomes = 5,000 genes/chromosome?

  45. Genes, chromosomes, and genomes F. vesca: 2n = 2x = 14 genome = 240 Mb average gene ~ 3kb 79,333 genes? 11,333 genes/chromosome? 35,000 genes….. ~ 5,000 genes/chromosome What’s the rest of the genome???????

  46. C-value paradox “Organisms of similar evolutionary complexity differ vastly in DNA content” Federoff, N. 2012. Science. 338:758-767. 1 pg = 978 Mb

  47. The C-value paradox Fedoroff Science 2012;338:758-767

  48. C-value paradox

  49. C-value paradox Junk??????? Shining a Light on the Genome’s ‘Dark Matter’

  50. Gene regulation • Pennisi, E. 2010. Science 330:1614. 40% of all human disease-related SNPs are OUTSIDE of genes • The dark matter is conserved and therefore must have a function • DNA sequences in the dark matter are involved in gene regulation • ~80% of the genome is transcribed but “genes” account for ~2% • RNAs of all shapes and sizes: • RNAi • lncRNA

More Related