1 / 39

GENERAL STUFF

GENERAL STUFF. subject: Genome-based Functional Annotation (bacteria) workload: 14 hrs - 2 hrs lecture - 12 hrs assignment (in 4 parts; so on average 3 hrs per part; not ready yet ) hand in: rtf-file, pdf-file or ppt-file before 8 November (later -1 point per day)

Gideon
Download Presentation

GENERAL STUFF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GENERAL STUFF subject: Genome-based Functional Annotation (bacteria) workload: 14 hrs - 2 hrs lecture - 12 hrs assignment (in 4 parts; so on average 3 hrs per part; not ready yet) hand in: rtf-file, pdf-file or ppt-file before 8 November (later -1 point per day) Christof Francke (Post-Doc/Scientist; TI Food and Nutrition)

  2. Genome sequence annotation From DNA to function Bioinformatics Seminar, Nijmegen 16 10 2007 Christof Francke (Jos Boekhorst/ Michiel Wels)

  3. Promised you a miracle promises, promises

  4. Answering biological questions Why does Bacillus anthracis kill humans? (anthrax = miltvuur) B. anthracis We have the genomes, so now we know............?

  5. When we have the genome sequenced, what do we know then/ what can we do then? Inventory: - predict functionality of encoded proteins - defects in genes (disease) - lineage - - - - - - - -

  6. The quest for an appropriate translation of sequence to knowledge DNA sequencing (assembly) identifying genes Part I protein function prediction function reconstructionmodeling biology

  7. Bacterial Genomics in Nijmegen Biological questions in the interest of Dutch Food Industry How can we improve the cell as a factory? - produce compounds - improve taste How can we prevent spoilage? - spores, biofilms, fungi How can we improve health? - interaction between bacteria and host (probiotics)

  8. Genome organization

  9. The organization of genetic information in bacteria Most Open Reading Frames are preceded by regulatory elements (cis-acting elements). promoter ORF AACGTTGACTGACGTGTCACGTCCCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCGATAGC A R - + RNA polymerase transcription mRNA RNA polymerase binding is affected by regulatory proteins (trans-acting elements; Activation, Repression).

  10. The organization of genetic information in bacteria Operon Gene 2 Gene 3 Gene 1 mRNA Translation start Multiple Operons Regulated by the same Transcription Factor: Regulon Protein 1 Protein 2 Protein 3

  11. DNA sequencing

  12. Whole genome shotgun sequencing Fraser et al, Nature 2000 406: 799-803.

  13. Wet lab Raw Data Production 4 x ABI 3700 sequencer >1.5 million nucleotides per day Bio-informatics Genome assembly Automated genome annotation In-house database, >5000 Blasts / Day I) The sequencing and assembly process Data Transfer

  14. Genome assembly initially there are a lot of gaps

  15. Methods for mapping contigs Figure 3 Sources of linking information between contigs. (A) overlaps, (B) clone mates, (C) alignments to reference genome, (D) alignments to physical maps, (E) conservation of gene synteny.

  16. The first Dutch bacterial genome-sequence (2003) Proc Natl Acad Sci USA 100,1990

  17. New technology: 454 sequencing Advantage: relatively fast, reliable and no sequence preference Disadvantage: short reads, difficult assembly Nowadays most sequencing efforts are hybrid

  18. Identifying genes AGCGGTGTCGATCGGCGCTATAGCGCATGCGTATAGCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCGATATGCTATAGC

  19. The identification of Open Reading Frames AGCGGTGTCGATCGGCGCTATAGCGCATGCGTATAGCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCGATATGCTATAGC TGTCGATCGGCGCTATAGCGCATGCGTATAGCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCATATGCTATAGCACGTTTG Different visualization: look at possible reading frames

  20. Coding sequences characterized by: a) the Lack of stop codons

  21. Leu : Ala : Trp random 6 : 4 : 1 coding 7 : 7 : 1 Characteristics of coding sequences: b) Codon usage In addition: codon bias!

  22. Coding sequences characterized by: c) Signals in the promoter region Translation start: ATG (GTG, CTG) Ribosome Binding Site: GGGAAGG

  23. GI_000001 GI_000002 Problems associate with Coding sequence recognition Problems: - many small putative CDS (cut-off) - deviations in start site - sequencing errors frameshifts

  24. Strategies to find Coding sequences In practice, most gene finding programs use HMMs to predict protein encoding genes. • Train on a set of known genes: • Genes with a good database hit • Large genes with no overlap • Experimentally identified genes • …

  25. Strategies to find Coding sequences Many different tools available: Glimmer2, GeneMark, EasyGene, FrameD, …… “Protein-coding regions in the genome sequence were identified using a combination of software tools including EasyGene [42], Glimmer [43] and FrameD [44].”

  26. Functional Annotation

  27. What is function? Inventory: - What can it do? - which conversions are catalized - which metabolites are transported - relates to physiology - depends on environment - with which component can it interact - - - - -

  28. The attribute function is ambiguous context independent(molecular function or properties) - catalyze certain reactions - interact with certain proteins - bind to a specific DNA sequence context dependent (role) - act in a certain pathway - be a member of a certain protein complex(es) - act as a transcription factor (Chemistry/physics) (Biology/ physiology)

  29. Gene Ontology Descriptors of molecular function Enzymatic conversions: EC-number (IUPAC) Transport: TC-number (Saier) Annotation using a controlled vocabulary (ontologies) In library and information science controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. Biopax

  30. Genome Sequence and how it relates to function There are several properties of the translated and non-translated genome sequence that are identifiers of the function/role of a protein • Evolutionary conservation of sequence • Operon composition • Regulatory connections • Connections in the cellular network (molecular function) (biological role)

  31. A1 B1 C1 A2 B2 C2a C2b Evolutionary conservation of sequence Homology as an indicator of functional similarity Orthologs: supposed identical molecular function Paralogs: supposed similar molecular function In-Paralogs: diverged (similar molecular function) homologs

  32. Evolutionary conservation of sequence Strategy: to transfer annotation from experimentally verified ortholog/equivalent -> identify orthologs/equivalents

  33. Determining evolutionary relations: Retrieving homologs BLAST: will yield similar sequences from database Example: map2 of L. plantarum In a simple case: one good hit per genome

  34. Determining evolutionary relations Procedure: #Collect sequences and make multiple sequence alignment MUSCLE: muscle -in FASTA.txt –out FASTA.aln

  35. Determining evolutionary relations: Alignments and Trees #Visualize multiple sequence alignment in CLUSTAL-X And check homogeneity (conserved features, little gaps) #Create bootstrapped NJ-tree (corrected for multiple substitutions)

  36. Determining evolutionary relations: Use tree and gene context to infer orthology/equivalency Example: Lactobacillus plantarum has 4 maltose phosphorylase homologs kojibiose (Chaen et al. J. appl Glycosci 1999) trehalose (Inoue et al. Biosci. Biotechnol. Biochem 2002) maltose (Huwel et al. Enzyme Microb. Techn. 1997) maltose (Inoue et al. Biosci. Biotechnol. Biochem. 2001) LOFT R. vd Heijden et al. BMC Bioinformatics

  37. P2 A S P1 Lactobacillus plantarum 0175 0180 map2 172 173 0445 0443 Lactobacillus gasseri 448 Bacillus subttilis 3456 map2/3 0606 Bacillus licheniformis map2/3 lacI PGPH Lactobacillus plantarum 1729 map3 0415 Lactobacillus brevis 365 Pediococcus pentosaceus 0536 0535 537 Leuconostoc mesenteroides 0017 0016 0144 0145 Leuconostoc mesenteroides 142 143 Evolutionary conservation of sequence Gene order conservation to identify functional equivalents

  38. Molecular function versus Biological role Map2 and 3 identical molecular function But distinct biological roles

  39. Coffee Break DNA sequencing (assembly) identifying genes Part I protein function prediction function reconstructionmodeling biology

More Related