1 / 44

Spring 2007

Bioinformatiatics. Spring 2007. Ch. 6 - Genomics. Completed genomes. Bioinformatiatics. Spring 2009. http://www.genomesonline.org. Bioinformatiatics. Spring 2009. Avg. genome = 5 mb Typical sequence coverage = 20X, therefore approx. 100 mb of DNA Avg. English word size = 5 letters

bruis
Download Presentation

Spring 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatiatics Spring 2007 Ch. 6 - Genomics

  2. Completed genomes Bioinformatiatics Spring 2009 • http://www.genomesonline.org

  3. Bioinformatiatics Spring 2009 • Avg. genome = 5 mb • Typical sequence coverage = 20X, therefore approx. 100 mb of DNA • Avg. English word size = 5 letters • Avg. words per page = 250, therefore 1250 letters per page • Avg. book size = 200 pages, therefore 250,000 letters per book • Approximately 400 books per genome • 958 completed genomes as of January 1, 2009 • Approximately 383,200 books worth of genomic information • MSU library holdings: 182,000

  4. Approaches to Genome Sequencing Bioinformatiatics Spring 2007 • Whole Genome Sequencing • Shotgun Sequencing • Expressed Sequence Tags • Comparative Genomics • Metagenomics

  5. Overview of Genome Sequencing Isolate Genomic DNA Genomic DNA Create Genomic Library BAC Clones Construction of Genome Map DNA Sequencing and Assembly

  6. ala, Qiagen’s DNeasy kit Isolating Genomic DNA • Lysis: • Proteinase K digestion • Lysis by chaotropic salt • Purification: • DNA negatively charged • Bind positively charged column • Wash (EtOH) away impurities • Elution: • Removal of DNA • Disrupt ionic interaction with high salt buffer • Preservation: • Store at -20°C to -160°C • Tris•EDTA buffer [pH 8.0]

  7. Sephadex Structure

  8. Creating a Genomic Library • Cut Genomic DNA: • Partial Restriction Digest • EcoRI & EcoRI methylase • Mechanical Shearing • Determine Avg. fragment size • Clone Fragments into BAC vectors: • Proporties of BACs BAC Clones • Transform E. coli: • Electroporation

  9. Pulse Field Gel Electrophoresis

  10. Average Insert Size by Pulse Field Gel Electrophoresis

  11. Average Insert Size in Human BACs

  12. Creating a Genomic Library • Cut Genomic DNA: • Partial Restriction Digest • EcoRI & EcoRI methylase • Mechanical Shearing • Determine Avg. fragment size • Clone Fragments into BAC vectors: • Proporties of BACs BAC Clones • Transform E. coli: • Electroporation

  13. Bacterial Artificial Chromosome • Derived from F plasmids • Multiple cloning site • Selectable Marker • Antibiotic Resistance Gene - ie, cm • Ori S - unidirectional • Par genes • partitioning genes • maintain single copy of BAC

  14. Creating a Genomic Library • Cut Genomic DNA: • Partial Restriction Digest • EcoRI & EcoRI methylase • Mechanical Shearing • Determine Avg. fragment size • Clone Fragments into BAC vectors: • Proporties of BACs BAC Clones • Transform E. coli: • Electroporation

  15. Construction of Genome Map Transformed E. coli: Plasmid Miniprep BAC Clones Construction of Genome Map • BAC end sequencing • Identify overlapping BACs • Subclone BACs into plasmids DNA Sequencing and Assembly

  16. Genome Assembly and Annotation

  17. Overview of Shotgun Sequencing Isolate Genomic DNA Genomic DNA Create Genomic Library Plasmid Clones DNA Sequencing and Assembly Construction of Genome Map

  18. Overview of EST Sequencing Isolate mRNA Create cDNA Create Genomic Library DNA Sequencing

  19. Comparative Genomics Isolate mRNA and create cDNA Create Genomic Library BAC Clones Construction of Genome Map DNA Sequencing and Assembly Synteny - same gene order preserved between species

  20. Comparative Genomics BAC array

  21. Comparative Genome Hybridization

  22. Bordetella phylogeny

  23. Comparative Genome Hybridization

  24. Comparative Genome Hybridization

  25. Metagenomic analysis • What is metagenomics? • Metagenomics is the genomic analysis of the collective genomes of an assemblage of organisms from a defined environment. • Handelsman, et al, 2002 • a.k.a., community genomics, environmental genomics • Derived from tools, techniques and models used in genomics. • Why do metagenomic analysis? • Genomic content of all eucaryotes, bacteria, archaea and viruses in an evironment. • Provides a picture of genetic/functional potential of the community.

  26. Metagenomics

  27. Venter’s Trip

  28. Yooseph, et al, PLOS biology, 2007

  29. Yooseph, et al, PLOS biology, 2007

  30. Creation of Fosmid Libraries

  31. Preliminary Categorization of 263 ORFs from a Fosmid Library of Subgingival Plaque

  32. Bioinformatiatics Spring 2007 Genome Annotation

  33. Genome Assembly and Annotation RefSeq db

  34. Caveats • Finding genes involves computational methods as well as experimental validation • Computational methods are often inadequate, and often generate erroneous ‘gene’ (false positive) sequences which: • Are missing exons • Have incorrect exons • Over predict genes • Where the 5’ and 3’ UTR are missing

  35. Things we are looking to annotate? • CDS • mRNA • Alternative RNA • Promoter and Poly-A Signal • Pseudogenes • ncRNA • Repeat elements • G+C content

  36. Pseudogenes • Could be as high as 20-30% of all Genomic sequence predictions could be pseudogene • Non-functional copy of a gene • Processed pseudogene • Retro-transposon derived • No 5’ promoters • No introns • Often includes poly-A tail • Non-processed pseudogene • Gene duplication derived • Both include events that make the gene non-functional • Frameshift • Stop codons • We assume pseudogenes have no function, but we really don’t know!

  37. Noncoding RNA (ncRNA) • tRNA – transfer RNA: involved in translation • rRNA – ribosomal RNA: structural component of ribosome, where translation takes place • snRNA – small nuclear RNA: functional/catalytic in RNA maturation • Antisense RNA - gene regulation • siRNA - gene silencing

  38. Noncoding RNA (ncRNA) • ncRNA represent 80-98% of all transcripts in cell • ncRNA have not been taken into account in gene counts • cDNA • ORF computational prediction • Comparative genomics looking at ORF • ncRNA can be: • Structural • Catalytic • Regulatory

  39. GenBank Features -10_signal -35_signal 3'clip 3'UTR 5'clip 5'UTR attenuator CAAT_signal CDS conflict C_region D-loop D_segment enhancer exon GC_signal gene iDNA intron J_segment LTR mat_peptide misc_binding misc_difference misc_feature misc_recomb misc_RNA misc_signal misc_structure modified_base mRNA N_region old_sequence polyA_signal polyA_site precursor_RNA primer_bind prim_transcript promoter protein_bind RBS repeat_region repeat_unit rep_origin rRNA satellite scRNA sig_peptide snoRNA snRNA S_region stem_loop STS TATA_signal terminator transit_peptide tRNA unsure variation V_region V_segment

  40. LOCUS NG_005487 1850 bp DNA linear ROD 14-FEB-2006 DEFINITION Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene (LOC625221) on chromosome 6. ACCESSION NG_005487 VERSION NG_005487.1 GI:87239965 KEYWORDS . SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. REFERENCE 1 (bases 1 to 1850) AUTHORS Wilson,R. TITLE Mus musculus BAC clone RP24-201D17 from 6 JOURNAL Unpublished (2003) COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AC121925.2. FEATURES Location/Qualifiers source 1..1850 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="6" /note="AC121925.2 32277..34126" gene 101..1750 /gene="LOC625221" /pseudo /db_xref="GeneID:625221" repeat_region 1792..1827 /rpt_family="ID" ORIGIN 1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc 61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag 121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa

  41. The ideal annotation of “MyGene” All clones All SNPs Promoter(s) MyGene All mRNAs All proteins • All protein modifications • Ontologies • Interactions (complexes, pathways, networks) • Expression (where and when, and how much) • Evolutionary relationships All structures

More Related