slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
GENERAL STUFF PowerPoint Presentation
Download Presentation
GENERAL STUFF

Loading in 2 Seconds...

play fullscreen
1 / 39

GENERAL STUFF - PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on

GENERAL STUFF. subject: Genome-based Functional Annotation (bacteria) workload: 14 hrs - 2 hrs lecture - 12 hrs assignment (in 4 parts; so on average 3 hrs per part; not ready yet ) hand in: rtf-file, pdf-file or ppt-file before 8 November (later -1 point per day)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'GENERAL STUFF' - Gideon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

GENERAL STUFF

subject: Genome-based Functional Annotation (bacteria)

workload: 14 hrs

- 2 hrs lecture

- 12 hrs assignment

(in 4 parts; so on average 3 hrs per part; not ready yet)

hand in: rtf-file, pdf-file or ppt-file before 8 November

(later -1 point per day)

Christof Francke (Post-Doc/Scientist; TI Food and Nutrition)

slide2

Genome sequence annotation

From DNA to function

Bioinformatics Seminar, Nijmegen 16 10 2007

Christof Francke

(Jos Boekhorst/ Michiel Wels)

slide3

Promised you a miracle

promises, promises

slide4

Answering biological questions

Why does Bacillus anthracis kill humans? (anthrax = miltvuur)

B. anthracis

We have the genomes, so now we know............?

slide5

When we have the genome sequenced, what do we know then/ what can we do then?

Inventory:

- predict functionality of encoded proteins

- defects in genes (disease)

- lineage

-

-

-

-

-

-

-

-

slide6

The quest for an appropriate translation of sequence to knowledge

DNA

sequencing (assembly)

identifying genes

Part I

protein

function prediction

function

reconstructionmodeling

biology

slide7

Bacterial Genomics in Nijmegen

Biological questions in the interest of Dutch Food Industry

How can we improve the cell as a factory?

- produce compounds

- improve taste

How can we prevent spoilage?

- spores, biofilms, fungi

How can we improve health?

- interaction between bacteria and host (probiotics)

slide9

The organization of genetic information in bacteria

Most Open Reading Frames are preceded by regulatory elements

(cis-acting elements).

promoter

ORF

AACGTTGACTGACGTGTCACGTCCCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCGATAGC

A

R

-

+

RNA polymerase

transcription

mRNA

RNA polymerase binding is affected by regulatory proteins

(trans-acting elements; Activation, Repression).

slide10

The organization of genetic information in bacteria

Operon

Gene 2

Gene 3

Gene 1

mRNA

Translation start

Multiple Operons

Regulated by the same Transcription Factor:

Regulon

Protein 1

Protein 2

Protein 3

slide12

Whole genome shotgun sequencing

Fraser et al, Nature 2000

406: 799-803.

slide13

Wet lab

Raw Data Production

4 x ABI 3700 sequencer

>1.5 million nucleotides

per day

Bio-informatics

Genome assembly

Automated genome

annotation

In-house database,

>5000 Blasts / Day

I) The sequencing and assembly process

Data Transfer

slide14

Genome assembly

initially there are a lot of gaps

slide15

Methods for mapping contigs

Figure 3 Sources of linking information between contigs. (A) overlaps, (B) clone mates, (C) alignments to reference genome, (D) alignments to physical maps, (E) conservation of gene synteny.

slide16

The first Dutch bacterial genome-sequence

(2003) Proc Natl Acad Sci USA 100,1990

slide17

New technology: 454 sequencing

Advantage: relatively fast, reliable and no sequence preference

Disadvantage: short reads, difficult assembly

Nowadays most sequencing efforts are hybrid

slide18

Identifying genes

AGCGGTGTCGATCGGCGCTATAGCGCATGCGTATAGCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCGATATGCTATAGC

slide19

The identification of Open Reading Frames

AGCGGTGTCGATCGGCGCTATAGCGCATGCGTATAGCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCGATATGCTATAGC

TGTCGATCGGCGCTATAGCGCATGCGTATAGCGTATATCGATGTCGTAGCTGATGGCGCGAAATCGATCGGTCGATATAGCGGCCGGATATCGCATATGCTATAGCACGTTTG

Different visualization: look at possible reading frames

slide20

Coding sequences characterized by:

a) the Lack of stop codons

slide21

Leu : Ala : Trp

random 6 : 4 : 1

coding 7 : 7 : 1

Characteristics of coding sequences:

b) Codon usage

In addition: codon bias!

slide22

Coding sequences characterized by:

c) Signals in the promoter region

Translation start:

ATG (GTG, CTG)

Ribosome Binding Site:

GGGAAGG

slide23

GI_000001

GI_000002

Problems associate with Coding sequence recognition

Problems:

- many small putative CDS (cut-off)

- deviations in start site

- sequencing errors

frameshifts

slide24

Strategies to find Coding sequences

In practice, most gene finding programs use HMMs to predict protein encoding genes.

  • Train on a set of known genes:
  • Genes with a good database hit
  • Large genes with no overlap
  • Experimentally identified genes
slide25

Strategies to find Coding sequences

Many different tools available:

Glimmer2, GeneMark, EasyGene, FrameD, ……

“Protein-coding regions in the genome sequence were identified using a combination of software tools including EasyGene [42], Glimmer [43] and FrameD [44].”

slide27

What is function?

Inventory:

- What can it do?

- which conversions are catalized

- which metabolites are transported

- relates to physiology

- depends on environment

- with which component can it interact

-

-

-

-

-

slide28

The attribute function is ambiguous

context independent(molecular function or properties)

- catalyze certain reactions

- interact with certain proteins

- bind to a specific DNA sequence

context dependent (role)

- act in a certain pathway

- be a member of a certain protein complex(es)

- act as a transcription factor

(Chemistry/physics)

(Biology/ physiology)

slide29

Gene

Ontology

Descriptors of molecular function

Enzymatic conversions: EC-number (IUPAC)

Transport: TC-number (Saier)

Annotation using a controlled vocabulary (ontologies)

In library and information science controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search.

Biopax

slide30

Genome Sequence and how it relates to function

There are several properties of the translated and non-translated genome sequence that are identifiers of the function/role of a protein

  • Evolutionary conservation of sequence
  • Operon composition
  • Regulatory connections
  • Connections in the cellular network

(molecular function)

(biological role)

slide31

A1

B1

C1

A2

B2

C2a

C2b

Evolutionary conservation of sequence

Homology as an indicator of functional similarity

Orthologs: supposed identical molecular function

Paralogs: supposed similar molecular function

In-Paralogs: diverged (similar molecular function)

homologs

slide32

Evolutionary conservation of sequence

Strategy: to transfer annotation from experimentally verified ortholog/equivalent

-> identify orthologs/equivalents

slide33

Determining evolutionary relations:

Retrieving homologs

BLAST: will yield similar

sequences from database

Example:

map2 of L. plantarum

In a simple case: one good hit per genome

slide34

Determining evolutionary relations

Procedure:

#Collect sequences and make multiple sequence alignment

MUSCLE: muscle -in FASTA.txt –out FASTA.aln

slide35

Determining evolutionary relations:

Alignments and Trees

#Visualize multiple sequence alignment in CLUSTAL-X

And check homogeneity (conserved features, little gaps)

#Create bootstrapped NJ-tree (corrected for multiple substitutions)

slide36

Determining evolutionary relations:

Use tree and gene context to infer orthology/equivalency

Example: Lactobacillus plantarum has 4 maltose phosphorylase homologs

kojibiose (Chaen et al. J. appl Glycosci 1999)

trehalose (Inoue et al. Biosci. Biotechnol. Biochem 2002)

maltose (Huwel et al. Enzyme Microb. Techn. 1997)

maltose (Inoue et al. Biosci. Biotechnol. Biochem. 2001)

LOFT R. vd Heijden et al. BMC Bioinformatics

slide37

P2

A

S

P1

Lactobacillus plantarum

0175

0180

map2

172

173

0445

0443

Lactobacillus gasseri

448

Bacillus subttilis

3456

map2/3

0606

Bacillus licheniformis

map2/3

lacI

PGPH

Lactobacillus plantarum

1729

map3

0415

Lactobacillus brevis

365

Pediococcus pentosaceus

0536

0535

537

Leuconostoc mesenteroides

0017

0016

0144

0145

Leuconostoc mesenteroides

142

143

Evolutionary conservation of sequence

Gene order conservation to identify functional equivalents

slide38

Molecular function versus Biological role

Map2 and 3 identical molecular function

But distinct biological roles

slide39

Coffee Break

DNA

sequencing (assembly)

identifying genes

Part I

protein

function prediction

function

reconstructionmodeling

biology