610 likes | 618 Views
DNA sequencing and genome architecture. Steps in Genetic Analysis. Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype
E N D
Steps in Genetic Analysis • Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype • A second step is determining the sequence of the gene (or genes)
Steps in Genetic Analysis • Subsequent steps involve…. • Understanding gene regulation • Understanding the context of the gene in the sequence of the whole genome • Analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways, how these pathways interact with the environment
Genome sequence of a diploid plant (2n = 2x = 14) • 5,300,000,000 base pairs • 165% of human genome • Enough characters for 11,000 large novels • 60,000,000 base pairs of expressed sequence • ~ 1% of total sequence, like humans • 125 large novels
Molecular tools for determining DNA (or RNA) sequence • Genomic DNA (or RNA) extraction • (RNA cDNA) • Manipulate DNA with restriction enzymes to reduce complexity and/or facilitate further manipulation • Manage and/or maintain DNA in vectors and/or libraries • Selecting DNA targets via amplification and/or hybridization • Determining nucleotide sequence of the targeted DNA
Extracting DNA (orRNA) • Genomic DNA (or RNA): • Leaf segments or target tissues • Key considerations are • Concentration • Purity • Fragment size
mRNA to cDNA Reverse transcriptase
Restriction Enzymes • Restriction enzymes make cuts at defined recognition sites in DNA • A defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages • The restriction enzymes are named for the organism from which they were isolated • Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assays • Each enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence
Restriction Enzymes • Recognition sites and fragment size: a four-base cutter ~ 256 bp (44); more frequently than a six-base cutter, which in turn will cut more often than one with an eight base-cutter • Methylation sensitive restriction enzymes • Target the epigenome
Restriction Enzymes • Palindrome recognition sites – the same sequence is specified when each strand of the double helix is read in the opposite direction. • Sit on a potato pan, Otis • Cigar? Toss it in a can, it is so tragic • UFO. tofu • Golf? No sir, prefer prison flog • Flee to me remote elf • Gnu dung • Lager, Sir, is regal • Tuna nut • CRISPR: Clustered regularly interspaced palindromic repeats
Vectors • Propagate and maintain DNA fragments generated by the restriction digestion • Efficiency and simplicity of inserting and retrieving the inserted DNA fragments • Key feature of the cloning vector: size of the DNA insert • Plasmid ~ 1 kb • BAC ~ 200 kb
Libraries Repositories of DNA fragments cloned in their vectors or attached to platform-specific oligonucleotide adapters Classified in terms of • cloning vector: e.g. plasmid, BAC • in terms of cloned DNA fragment source: e.g. genomic, cDNA • In terms of intended use: e.g. next generation sequencing (NGS)
Genomic DNA libraries • Total genomic DNA digested and the fragments cloned into an appropriate vector or system • Representative sample all the genomic DNA present in the organism, including both coding and non-coding sequences • Enrichment strategies: target specific types of sequences • unique sequences • Methylated sequences
cDNA libraries • Generated from mRNA transcripts, using reverse transcriptase • The cDNA library represents only the genes that are expressed in the tissue and/or developmental stage that was sampled
DNA Amplification: Polymerase Chain Reaction (PCR) • K.B Mullis, 1983 • in vitro amplification of ANY DNA sequence https://www.youtube.com/watch?v=2KoLnIwoZKU
Synthetic DNA: oligonucleotides • Primers, adapters, and more …~$0.010 per bp... • < ~ 100 bases
DNA Amplification: Polymerase Chain Reaction (PCR) Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.
DNA Amplification - PCR A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.
PCR Principles • The PCR reaction consists of: • Buffer • DNA polymerase (thermostable) • Deoxyribonucleotide triphosphates (dNTPs) • Two primers (oligonucleotides) • Template DNA
PCR Principles • Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.
PCR Principles • The choice of what DNA will be amplified by the polymerase is determined by the primers • The DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templates • Steps in the reaction include denaturing the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension
PCR Applications • Amplify a target sequence from a pool of DNA (your favorite gene, forensics, fossil DNA) • Start the process of genome sequencing • Generate abundant markers for linkage map construction molecular markers
DNA Hybridization • Single strand nucleic acids find and pair with other single strand nucleic acids with a complementary sequence • An application of this affinity is to label one single strand and then to use this probe to find complementary sequences in a population of single stranded nucleic acids • For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample • The microarray concept: • https://www.dnalc.org/resources/3d/26-microarray.html
DNA Hybridization The principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody Southern blot Northern blot Western blot
Sanger DNA Sequencing - classic but not obsolete The gold standard for accurate sequencing of short (~ 650 bp) DNA sequences https://www.dnalc.org/view/15479-Sanger-method-of-DNA-sequencing-3D-animation-with-narration.html
Next Generation Sequencing - Illumina A tremendous amount of short read data in a short time, at a good price https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8
Sequencing - PAC Bio Long reads! https://www.youtube.com/watch?v=v8p4ph2MAvI
Sequencing - Nanopore Potentially, cheap, fast, portable
Sequencing considerations Read length Accuracy Speed Cost Assembly
Genome sizes and whole genome sequencing Credit: Karl Kristensen, Denmark
Fragariavesca Sequencing a plant genome • Herbaceous, perennial • 2n=2x=14 • 240 Mb • Reference species for Rosaceae • Genetic resources Credit: commons.Wikimedia.org Fragaria x ananassa: 2n=8x=56. Domesticated 250 years ago
Sequencing a plant genome • Short reads • No physical reference • De novo assembly • Open source
Sequencing a plant genome • Roche 454, Illumina • X39 coverage (number of reads including a given nucleotide) • Contigs (overlapping reads) assembled into scaffolds (contigs + gaps) • ~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length) • Over 95% (209.8 Mb) of total sequence is represented in 272 scaffolds
Anchoring the genome sequence to the genetic map Sequencing a plant genome • 94% of scaffolds anchored to the diploid Fragariareference linkage map using 390 genetic markers • Pseudochromosomes ~ linkage groups
Sequencing a plant genome Synteny Homologs Orthologs Paralogs Credit: Biology stackexchange.com
Sequencing a plant genome The small genome size (240 Mb) • Absence of large genome duplications • Limited numbers of transposable elements, compared to other angiosperms
Sequencing a plant genome the transcriptome • Fruits and roots – different types of genes
Sequencing a plant genome Gene prediction • 34,809 nuclear genes • flavor, nutritional value, and flowering time • 1,616 transcription factors RNA genes • 569 tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs, • 76 micro RNA and 24 other RNAs Chloroplast genome • 155,691 bp encodes 78 proteins, 30 tRNAs and 4 rRNA genes • Evidence of DNA transfer from plastid genome to the nuclear genome
Genome architecture and evolution • Key considerations: • Genes • Chromosomes • C value paradox • Gene regulation • Epigenetics • Transposable elements
Genome architecture and evolution S. Wessler on Transposable elements: https://www.youtube.com/watch?v=_cJfsWYR42M Review of retroviruses: https://www.youtube.com/watch?v=OxwKuFZbDzc Review of cut and past transposition: https://www.youtube.com/watch?v=XYZHMGUGq6o
Genes DNA ……..……..mRNA……….…….Protein Transcription tRNA rRNA Translation Other RNAs?
Genes (classically..) DNA specifying a protein 200 – 2,000,000 nt (bp) promoter Coding region • Exon • Intron • Exon • Intron Exon • Start • codon • Stop • codon • 3’UTR • 5’UTR • Basal • promoter • +1 • Termination • signal • ORF • mRNA • CDS
Chromosomes F. vesca: 35,000 genes/7 chromosomes = 5,000 genes/chromosome?
Genes, chromosomes, and genomes F. vesca: 2n = 2x = 14 genome = 240 Mb average gene ~ 3kb 79,333 genes? 11,333 genes/chromosome? 35,000 genes….. ~ 5,000 genes/chromosome What’s the rest of the genome???????
C-value paradox “Organisms of similar evolutionary complexity differ vastly in DNA content” Federoff, N. 2012. Science. 338:758-767. 1 pg = 978 Mb
The C-value paradox Fedoroff Science 2012;338:758-767
C-value paradox Junk??????? Shining a Light on the Genome’s ‘Dark Matter’
Gene regulation • Pennisi, E. 2010. Science 330:1614. 40% of all human disease-related SNPs are OUTSIDE of genes • The dark matter is conserved and therefore must have a function • DNA sequences in the dark matter are involved in gene regulation • ~80% of the genome is transcribed but “genes” account for ~2% • RNAs of all shapes and sizes: • RNAi • lncRNA