Protein function
Download
1 / 86

Bologna Winter School 2007 - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Protein Function. Bologna Winter School 2007. How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins inferred from genomic sequences, how can we assign functions to them?. Basic questions:. Genomics gives us many new protein sequences.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Bologna Winter School 2007' - ling


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Protein function

Protein Function

Bologna Winter School 2007


Basic questions

How do proteins evolve changed or novel functions?

Given the amino acid sequences of proteins inferred from genomic sequences, how can we assign functions to them?

Basic questions:


Genomics gives us many new protein sequences
Genomics gives us many new protein sequences

  • Often there is little experimental information about the proteins themselves

  • What can we deduce about proteins from their amino acid sequences?

    … from the amino acid sequence of one protein alone?

    … from comparisons of amino acid sequences of related proteins from different species?


What properties of proteins do we want to learn about and how do we measure and analyse them
What properties of proteins do we want to learn about and how do we measure and analyse them?

  • amino acid sequence

  • three-dimensional structure

  • FUNCTION

  • expression pattern

  • regulation


Can we learn these properties by studying purified proteins in isolation
Can we learn these properties by studying purified proteins in isolation?

  • amino acid sequence – yes, in principle

  • three-dimensional structure -- certainly

  • FUNCTION -- ??????

  • expression pattern – yes if we had to

  • regulation – probably not


How do we learn these
How do we learn these? in isolation?

  • amino acid sequence – genomic sequences

  • three-dimensional structure – X-ray, NMR, ... modelling

  • FUNCTION – experiment? inference?

  • expression pattern -- microarrays

  • regulation – chip/chip experiments


Does knowledge about related proteins help
Does knowledge about related proteins help? in isolation?

  • amino acid sequence – possibly

  • three-dimensional structure – MR, modelling

  • FUNCTION – YES! BUT, HOW??

  • expression pattern – maybe

  • regulation -- maybe


Function is difficult
Function is difficult in isolation?

  • Sequence determines structure determines function

  • From knowing sequence and structure of one protein alone, can we deduce its function?

    • Identify binding site?

    • Identify catalytic residues?

    • Identify ligand?

  • Analogy to drug-design problem.


Given a protein structure can we predict function directly
Given a protein structure can we predict function directly? in isolation?

  • Sometimes… To some extent …

  • What are reasonable goals?

  • Sometimes structure gives general idea, guiding laboratory work to pin it down

  • Some examples from H. influenzae structural genomics project


Hi1679
HI1679 in isolation?

  • α/β- hydrolase fold, putative remote homology to L-2-haloacid dehydrogenases

  • Several substrates tried.

  • HI1679 cleaved 6-phosphogluconate, phosphotyrosine


Hi1434
HI1434 in isolation?

  • related to a region in tRNA synthetases.

  • contains putative binding site, likely to bind nucleotide

  • no specific ligand has yet been identified


Nuclear transport factor 2
Nuclear Transport Factor-2 in isolation?

  • Protein known to be involved in traffiicking across nuclear membrane

  • Crystal structure determined

  • Mechanism of function not obvious

  • ???


Ntf 2 homologous to scytalone dehydratase
NtF-2 homologous to scytalone dehydratase in isolation?

  • Alexei Murzin spotted a similarity of fold between NTF-2 and scytalone dehydratase

  • This structure shows scytalone dehydratase binding an inhibitor


Scytalone dehydratase
Scytalone dehydratase in isolation?

Scytalone dehydratase is an enzyme in the pathway for melanin synthesis


Ntf 2 superposition
NTF-2 Superposition in isolation?


Search for ligands
Search for ligands in isolation?

  • On the basis of the structural similarity, many ligands were designed and tested

  • So far, none has shown any binding or catalyzed reactivity

  • Conclusion: structural similarity is useful guide to hypotheses about function, but doesn’t always work …


But many similar proteins have similar functions don t they
But many similar proteins have similar functions, don't they?

  • In many cases closely-related proteins have closely-related functions.

  • Example: human and horse haemoglobin

  • 43 residue differences out of 446 (α+β chains)

  • 96% residue identity

  • SAME FUNCTION


Function assignment from homology
Function assignment from homology? they?

  • OK, if the sequences differ greatly then the function may differ

  • But if the sequences are similar, the functions will be the same – WON'T THEY?

  • Well, sometimes ...


Homology modelling of function
'Homology modelling' of function? they?

  • Sequence determines structure determines function

  • Small changes in sequence produce small changes in structure

  • BUT:

    dependence of function on sequence (and even on structure) doesn't have simple ‘topology’



Recruitment
Recruitment they?

  • In many cases, similar proteins retain similar functions (example: mammalian globins)

  • Distantly-related proteins can retain function or diverge in function

  • But closely-related proteins can have very different functions

  • Even identical proteins can carry out different functions


Avian eye lens proteins
Avian eye-lens proteins they?

  • In the duck, crystallins have identical sequences to liver enolase and lactate dehydrogenase

  • They never see the substrates in the eye

  • In other birds, sequences have changed enough to lose catalytic activity. This proves that enzymatic activity not necessary in eye


Proteinase do degp
Proteinase do = DegP they?

  • Chaperone at low temperatures

  • Proteinase at high temperatures

  • Logic: moderate stress – try to rescue proteins

  • more extreme stress – give up and recycle


Function annotation in databases
Function annotation in databases they?

  • Proteins appear in databases when their sequences are known

  • Annotation of function?

    • Experimental evidence for function

    • Transfer of function from homologue

      • How well does this work?

      • How can we tell?

      • Requires measure of distance between functions


Two goals of this kind of work
Two goals of this kind of work they?

  • To study how protein function diverges as amino acid sequence diverges

  • To evaluate the accuracy of transfer of annotation among homologous proteins

    Problems associated with goal 2 make goal 1 harder


How do proteins change function as their sequences diverge
How do proteins change function as their sequences diverge they?

  • Divergence v. recruitment

  • Divergence:

    • Change in specificity (chymotrypsin, trypsin)

    • Change in regulation (myoglobin, haemoglobin)

    • Related functions with similar mechanisms (adaptation of catalytic site) (Gerlt & Babbitt)


Gene duplication and divergence
Gene duplication and divergence they?

  • General way to develop new functions

  • Very old theory about how metabolic pathways developed – new protein developed to provide substrate for current initial step:

    • Now growing on B (BCD…ATP)

    • Medium runs out of B.

    • BC enzyme duplicates, diverges to catalyze AB

    • Now you can grow on A (ABCD…ATP)

  • Attractive because:

    • BC enzyme has binding site for B

    • explains gene organization in operon

      WRONG: mechanism of AB in general different from BC, needs different structure, catalytic residues


Derivation of function from coordinates analysis of sequence and structure
Derivation of function from coordinates analysis of sequence and structure

  • Homologous proteins may have diverged in sequence and function (leave aside recruitment)

  • Assume no strong sequence similarity to protein of known function

  • Align sequences

  • Use structure to get better alignments

  • Check for conservation of binding site, catalytic residues


Structure based function assignment
Structure-based function assignment and structure

  • Extract functional residues from structures of known function

  • Residues contributing to function of entire homologous family conserved in whole family

  • Residues contributing to specific function of subfamily conserved only in subfamily


Several groups have applied these ideas
Several groups have applied these ideas and structure

  • Cohen & Lichtarge, ‘Evolutionary Trace Method’ (J. Mol. Biol. 1996)

  • Irving, Whisstock, Lesk (Proteins 2001)

  • Hannenhalli & Russell (J. Mol. Biol. 2000)

  • Sternberg and coworkers (PNAS 2004, Phil. Trans. Roy. Soc. 2006)

  • See also: Automated Function Prediction, ISMB Special Interest Group Meeting, 2005



How to measure distance between functions
How to measure distance between functions? and structure

  • For sequences and structures, there are natural measures of divergence

  • Sequence: count identical residues

  • Structures: r.m.s.d. of well-fitting parts

    (Specialists may argue about details, or propose alternatives, but basically the answers aren't too different.)

  • Function: no natural measure of difference


Enzyme commission ec numbers
Enzyme Commission / EC numbers and structure

  • (EC numbers NOT European Commission)

  • Authorized by International Union of Biochemistry and Commission on Enzyme Nomenclature

  • EC set up by International Union of Biochemistry in 1955.

  • Report in 1961, modified 1964, several supplements since then.

  • Published as book, now available on web


What does ec classify
What does EC classify and structure

  • Enzyme nomenclature

  • Classification of reactions catalysed by enzymes

  • NOT a set of assignment of function to proteins – That is a different task

  • (Note that Gene Ontology – another classification scheme – also does not assign functions to proteins)


Enzyme commission numbers
Enzyme Commission numbers and structure

  • Four-level hierarchy

  • Example: isopentenyl-diphosphate ∆-isomerase EC number 5.3.3.2:

    • 5 = general category (of isomerases)

    • 5.3 = intramolecular isomerases

    • 5.3.3 = enzymes that transpose C=C bonds

    • 5.3.3.2 = specific reaction

  • EC classifies reactions, names enzymes that catalyse reactions, does not name proteins.


Gene ontology
Gene Ontology and structure

  • EC limited to enzymes

  • Gene Ontology consortium produced new, more general classification of protein function

  • Three independent categories:

    • Molecular function (overlaps EC)

    • Biological process

    • Subcellular location

  • GO: not tree structure, directed acyclic graph


Gene ontology project
Gene Ontology project and structure

  • Initiated by Michael Ashburner (early 1990’s).

  • Has since grown, become de facto standard

  • References:

    • Lewis, S.E. (2004). Gene Ontology: looking backwards and forwards.Genome Biology 6:103.

    • Ashburner, M. (2006). Won for All / How the Drosophila Genome was Sequenced.  Cold Spring Harbor Laboratory Press.


What is an ontology
What is an ontology? and structure

  • Specification of how to describe a body of knowledge

    • Nomenclature (fixed vocabulary)

    • Rules of syntax of terms

    • Types of relationships among entities:

      • ‘Is a’: for instance: ‘A catis amammal.’

      • ‘Part of’: for instance: ‘A tail is part of a cat.’


What is an ontology1
What is an ontology? and structure

  • Types of relationships among entities:

    • ‘Is a’: for instance: ‘A catis amammal.’

    • ‘Part of’: for instance: ‘A tail is part of a cat.’

    • Note that ‘A cat is a mammal. A mammal is an animal’ implies that ‘A cat is an animal’

    • But ‘A tail is part of a cat. A cat is a mammal.’ does NOT imply that a tail is a mammal.


Gene ontology1
Gene Ontology and structure

  • EC limited to enzymes

  • Gene Ontology consortium produced new, more general classification of protein function

  • Three independent categories:

    • Molecular function (overlaps EC)

    • Biological process

    • Subcellular location

  • GO: not tree structure, directed acyclic graph


Gene ontology2
Gene Ontology and structure

  • EC limited to enzymes

  • Gene Ontology consortium produced new, more general classification of protein function

  • Three independent categories:

    • Molecular function (overlaps EC)

    • Biological process

    • Subcellular location

  • GO: not tree structure, directed acyclic graph


Go classification of isopentenyl diphosphate isomerase
GO classification of isopentenyl-diphosphate ∆-isomerase


Several groups have measured relationship between sequence divergence and functional divergence using EC classification

  • Example: Todd, Orengo & Thornton, JMB 2001

  • For enzymes, sequence identity > 40%, all four EC numbers conserved

  • sequence identity > 30% three levels of EC numbers conserved for 70% of pairs

  • How can this work be extended to GO classification?



How to define metric on functions
How to define metric on functions? divergence and functional divergence using EC classification


Distal go ids
Distal GO-IDs divergence and functional divergence using EC classification


How to measure distance between sets of go ids
How to measure distance between SETS of GO-IDs divergence and functional divergence using EC classification


Dependence of function divergence on sequence divergence the ef hand family
Dependence of function divergence on sequence divergence: the EF-hand family

Fraction of pairs

GO distance


Go sources of annotation
GO: Sources of annotation the EF-hand family

  • GO categories of sources of annotation:

    IDA: Inferred from direct assay

    TAS: Traceable author statement

    IMP: Inferred from mutant phenotype

    IGI: Inferred from genetic interaction

    IPI: Inferred from physical interaction

    ISS: Inferred from sequence similarity

    IEA: Inferred from electronic annotation

    NAS: Non-traceable author statement


Sources of Annotation: Experiment / Inferred the EF-hand familyFrom: Thomas, P.D., Mi, H. & Lewis, S. (2007). Curr. Opin. Chem. Biol. 11, 4-11.


To study accuracy of annotation transfer use experimental annotation only
To study accuracy of annotation transfer, use experimental annotation only?

  • Obviously.

  • But there are problems.

  • Many fewer data

  • Inconsistencies

  • Sometimes annotation correct, but source of annotation incorrect


Conclusions
Conclusions annotation only?

  • It is possible to define statistical distribution describing relationship between divergence of sequence and divergence of function

  • General rule: sequences diverge, function diverges But: exceptions exist

  • Threshold at about 50% sequence identity at which sequence starts to diverge more radically

  • Databases contain many errors or incompleteness, still human, labour-intensive activity


Errors in databases
Errors in databases annotation only?

1. Keep them out – But how?

2. natural language processing by computer? (Automatic: literature → database)???

3. If you find them correct them (you = WHO?)

4. Correct them where?

  • Master copy of database?

  • What about copies? Errors propagate?

  • How to propagate corrections?


Correction of errors in databases
Correction of Errors in Databases? annotation only?

  • Eternal vigilance at each installation?????

  • Community involvement – curation by experts?

  • Open source idea – bulletin board?

  • ‘Knowbots’ running around web? Security?

  • Distribute programs for ‘health checks’?


Inconsistencies
Inconsistencies annotation only?

  • Different databases use different versions of GO

  • Different versions of different databases

  • Downloaded versions of different databases may not be updated to reflect changes in parent databases

  • What can be done?


Distributed updating of databases park park kim 2004 bioinformatics appl note
Distributed updating of databases annotation only?Park, Park & Kim (2004). Bioinformatics Appl. Note.

  • Gene Ontology classification provides basis for database annotations

  • Updates to GO include:

    • new terms

    • new obsoletions

    • term name changes

    • new definitions

    • new term merges

    • term movements

  • Require updating of annotations


Gochase park park kim
GOChase (Park, Park & Kim) annotation only?

  • Recommend updates (security considerations require local file changes)

  • Web-based interfaces:

    • GOChase-History: evolution of GO ID

    • GOChase-Correct: suggests change

    • Health check of your database: flag problems

    • Submit GO ID: report its use in annotation in a list of common databases

      http://www.strubi.org/software/GOChase/



What are we looking for
What are we looking for? useful in assigning function?

  • We might try to identify proteins that have similar functions in same or different species

    • Human and Horse haemoglobin

    • We may be able to find these if they are homologues

  • We might try to identify proteins that have coordinated functions in same or different species

    • Two or more proteins in same metabolic pathway, or part of same macromolecular complex

    • These may in general NOT be homologues


Various clues that proteins have coordinated activities
Various clues that proteins have coordinated activities useful in assigning function?

  • Linked on genome? (Best for bacteria, not for archaea; occasionally for eukaryotes)

  • Appear as separate (monomeric) proteins in one species, and as single multidomain protein in other species

  • Often separate proteins in prokaryotes are fused in eukaryotes (but some examples of opposite are known)



Shikimate kinase in methanococcus jannaschii
Shikimate kinase in useful in assigning function?Methanococcus jannaschii

  • In E. coli, shikimate kinase is an enzyme in the pathway of synthesis of chorismate from erythrose-4-phosphate

  • chorismate is a branch compound for the synthesis of aromatic amino acids

  • tryptophan synthetase pathway one of the best worked-out in E. coli, in terms of enzymology and regulation


Pathway of synthesis of shikimate from erythrose 4 p in e coli
Pathway of synthesis of shikimate from erythrose-4-P in useful in assigning function?E. coli

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.


Cross table of metabolic steps and genes
Cross-table of metabolic steps and genes useful in assigning function?

  • Match up known genes and known metabolic steps

  • No recognized protein for metabolic step?

    • Maybe metabolic step is missing from that organism

  • No recognized function for some gene?

  • Maybe can match up missing function with gene missing function assignment


Matching gene with function
Matching gene with function useful in assigning function?

  • Check for homologues

    • Maybe find several

    • Maybe find none

  • Look in genome for operons containing succession of genes for steps in pathway

  • Usually works in bacteria

  • Less common in archaea


Aromatic amino acid biosynthesis
Aromatic amino acid biosynthesis useful in assigning function?

R. Boyer


E coli trp operon
E. coli useful in assigning function? trp operon

Note collinearityof genes with order of reactions in pathway

From: Garret, R.H. & Grisham, C.M. (1999) Biochemistry. 2nd ed. (Thomson Higher Education, Belmont, CA)


Shikimate kinase in methanococcus jannaschii1
Shikimate kinase in useful in assigning function?Methanococcus jannaschii

  • In M. jannaschii, the shikimate kinase pathway is NOT catalysed by enzymes consecutive in the genome in an operon

  • Sequence similarity identified most enzymes but not shikimate kinase

  • In another archaeon, A. pernix, the genes in this pathway ARE collinear.

  • From this is was possible to identify the A. pernix shikimate kinase, and from that the M. jannaschii homologue. Reference: Dougherty et al., J. Bacteriology (2001). 183, 292–300.


Mapping of genes in silicate synthesis pathway in several prokaryotic genomes

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.


Mapping of genes for shikimate synthesis in several prokaryotic genomes
Mapping of genes for shikimate synthesis in several prokaryotic genomes

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.


From: Daugherty in several prokaryotic genomeset al., J Bacteriol. 2001 January; 183(1): 292–300.


Why didn t homology search work
Why didn’t homology search work? in several prokaryotic genomes

  • Archaeal shikimate kinase is NOT related to bacterial or eukaryotic shikimate kinases.

  • It is distantly related to homoserine kinases of the GHMP kinase superfamily.

  • M. jannaschii homoserine kinase IS identifiable by homology

  • The two enzymes are substrate-specific


Phylogenetic profiles
Phylogenetic profiles in several prokaryotic genomes

  • Clues to function from genes shared among different organisms

  • Different groups of organisms need different sets of genes

  • For instance, some bacteria have flagellae

  • Genes found in bacteria that contain flagellae but not in other bacteria or other groups of organisms: involved in flagellar function


Phylogenetic profiles1
Phylogenetic Profiles in several prokaryotic genomes

  • Developed by Marcotte, Eisenberg et al.(PNAS 96, 4285-4288, 1999 and elsewhere)

  • Tabulate homologues of E. coli proteins in 16 other genomes

  • (Note: assume homologues share function – this is input to method, not result)

  • Table: column = organism, row = gene

  • Put a  if organism has gene



Phylogenetic profile
Phylogenetic profile U.S.A. 96, 4285-4288

  • Pattern of row = barcode of which organisms a gene occurs in

  • Result: Genes that share patterns are ‘functionally linked’

  • Functionally linked = participate in some coordinated way in some structure or process

  • Note: proteins can be functionally linked even if they are not homologous


Example ribosomal proteins
Example: ribosomal proteins U.S.A. 96, 4285-4288

  • Homologues of coil protein RL7 are found in 10 bacterial genomes and yeast, not in archaea

  • Those that match phylogenetic profile have functions associated with ribosome

  • Have pulled out sets of ribosomal proteins on basis of phylogenetic profile

  • Linked proteins need not be homologues nor be localized in genome


Combine phylogenetic profiling with matching orphans
Combine phylogenetic profiling with matching ‘orphans’ U.S.A. 96, 4285-4288

  • Create metabolic network for an organism

  • Assign functions by homology when possible

  • Missing enzymes in pathway?

  • Genes that lack assignment?

  • Try to match these up (recall archaeal shikimate kinase)

  • Phylogenetic profiles can assist in this



Phylogenetic profiles orphan assignment chen vitkup 2006 genome biol 7 r17
Phylogenetic profiles / orphan assignment U.S.A. 96, 4285-4288Chen & Vitkup (2006). Genome Biol. 7, R17

  • Phylogenetic profiles can link proteins in a metabolic pathway

  • Even more, better fit of profile implies closer in metabolic network

  • Test, using yeast:

    • remove gene from network

    • try to recover it from pool of ~6000 genes

    • results: 22.8% top prediction correct

      (37.3% correct answer in top 10)


Conclusion
Conclusion U.S.A. 96, 4285-4288

  • Inferring protein function from knowledge of function of close relative is like solving the clue of an American crossword puzzle. Finding the precise word is difficult but task in principle straightforward

  • Inferring function a priori from structure like British crossword puzzle. Which clues are real? which clues are misleading?


State of the art in function assignment
State of the art in function assignment U.S.A. 96, 4285-4288

  • We have a ‘bag of tricks’ – that is, many methods, all of which work sometimes and fail sometimes.

  • In some cases, no method works except go back to the lab and work it out.

  • We do not have a unified framework or a systematic approach to function assignment


ad