slide1
Download
Skip this Video
Download Presentation
Intracellular Networks

Loading in 2 Seconds...

play fullscreen
1 / 104

Intracellular Networks - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

C. E. N. T. E. R. F. O. R. I. N. T. E. G. R. A. T. I. V. E. B. I. O. I. N. F. O. R. M. A. T. I. C. S. V. U. Intracellular Networks. (2) Intracellular Network Behaviour Protein Function Prediction. Networks.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Intracellular Networks' - hide


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
C

E

N

T

E

R

F

O

R

I

N

T

E

G

R

A

T

I

V

E

B

I

O

I

N

F

O

R

M

A

T

I

C

S

V

U

Intracellular Networks

(2) Intracellular Network Behaviour

Protein Function Prediction

networks
Networks

"The thousands of components of a living cell are dynamically interconnected, so that the cell’s functional properties are ultimately encoded into a complex intracellular web [network] of molecular interactions."

"This is perhaps most evident with cellular metabolism, a fully connected biochemical network in which hundreds of metabolic substrates are densely integrated through biochemical reactions." (Ravasz E, et al.)

slide4
TF

Ribosomal proteins

slide6
Network Evolution

Human

Yeast

This pathway diagram shows a comparison of pathways in (left) Homo sapiens(human) and (right)Saccharomycescerevisiae(baker’s yeast). Changes in controlling enzymes (square boxes in red) and the pathway itself have occurred (yeast has one altered (‘overtaking’) path in the graph)

the citric acid cycle
The citric-acid cycle

http://en.wikipedia.org/wiki/Krebs_cycle

the citric acid cycle8
The citric-acid cycle

Fig. 1. (a) A graphical representation of the reactions of the citric-acid cycle (CAC), includingthe connections with pyruvate and phosphoenolpyruvate, and the glyoxylate shunt.When there are two enzymes that are not homologous to each other but that catalyse thesame reaction (non-homologous gene displacement), one is marked with a solid line andthe other with a dashed line. The oxidative direction is clockwise. The enzymes with theirEC numbers are as follows: 1,citrate synthase (4.1.3.7); 2, aconitase (4.2.1.3); 3, isocitratedehydrogenase (1.1.1.42); 4, 2-ketoglutarate dehydrogenase(solid line; 1.2.4.2 and2.3.1.61) and 2-ketoglutarate ferredoxin oxidoreductase (dashed line; 1.2.7.3); 5, succinyl-CoA synthetase (solid line; 6.2.1.5) or succinyl-CoA–acetoacetate-CoA transferase(dashed line; 2.8.3.5); 6, succinate dehydrogenase or fumarate reductase (1.3.99.1); 7,fumarase (4.2.1.2) class I (dashed line) and class II (solid line); 8, bacterial-type malatedehydrogenase (solid line) or archaeal-type malate dehydrogenase (dashed line)(1.1.1.37); 9, isocitrate lyase (4.1.3.1); 10, malate synthase (4.1.3.2); 11, phosphoenolpyruvatecarboxykinase (4.1.1.49) or phosphoenolpyruvate carboxylase (4.1.1.32);12, malic enzyme (1.1.1.40 or 1.1.1.38); 13, pyruvate carboxylase or oxaloacetate decarboxylase(6.4.1.1); 14, pyruvate dehydrogenase (solid line; 1.2.4.1 and 2.3.1.12) andpyruvate ferredoxin oxidoreductase (dashed line; 1.2.7.1).

M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of thecitric acid cycle: a genomic approach'' Trends Microbiol, 7, 281-29 (1999)

the citric acid cycle9
The citric-acid cycle

b) Individual species mightnot have a complete CAC. This diagram shows the genes for the CAC for each unicellularspecies for which a genome sequence has been published, together with the phylogeny ofthe species. The distance-based phylogeny was constructed using the fraction of genesshared betweengenomes as a similarity criterion. The major kingdoms of life are indicatedin red (Archaea), blue (Bacteria) and yellow (Eukarya). Question marks representreactions for which there is biochemical evidence in the species itself or in a relatedspecies but for which no genes could be found. Genes that lie in a single operon areshown in the same color.Genes were assumed to be located in a single operon whenthey were transcribed in the same direction and the stretches of non-coding DNA separatingthem were less than 50 nucleotides in length.

M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of thecitric acid cycle: a genomic approach'' Trends Microbiol, 7, 281-29 (1999)

slide12
Small-world networks

A recent paper, Collective dynamics of "small-world" networks, by Duncan J. Watts and Steven H. Strogatz, which appeared in Nature volume 393, pp. 440-442 (4 June 1998), has attracted considerable attention.

One can consider two extremes of networks:

The first are regular networks, where "nearby" nodes have large numbers of interconnections, but "distant" nodes have few.

The second are random networks, where the nodes are connected at random.

Regular networks are highly clustered, i.e., there is a high density of connections between nearby nodes, but have long path lengths, i.e., to go from one distant node to another one must pass through many intermediate nodes.

Random networks are highly un-clustered but have short path lengths. This is because the randomness makes it less likely that nearby nodes will have lots of connections, but introduces more links that connect one part of the network to another.

regular and random networks
Regular and random networks

random

regular

regular complete

regular small world and random networks rewiring experiments watts and strogatz 1998
Regular, small-world and random networks:Rewiring experiments (Watts and Strogatz, 1998)

p is the probability that a randomly chosen connection will be randomly redirected elsewhere (i.e.,p=0 means nothing is changed, leaving the network regular; p=1 means every connection is changed and randomly reconnected, yielding complete randomness).

For example, for p = .01, (so that only 1% of the edges in the graph have been randomly changed), the "clustering coefficient" is over 95% of what it would be for a regular graph, but the "characteristic path length" is less than 20% of what it would be for a regular graph.

small world and networks
Small-world and networks

A small-world network can be generated from a regular one by randomly disconnecting a few points and randomly reconnecting them elsewhere.

Another way to think of a small world network is that some so-called 'shortcut' links are added to a regular network as shown here:

The added links are shortcuts because they allow travel from node (a) to node (b), to occur in only 3 steps, instead of 5 without the shortcuts.

small world networks
Small-world networks
  • Network characterisation:
  • L = characteristic path length
  • C = clustering coefficient
      • A small-world network is much more highly clustered than an equally sparse random graph (C >> Crandom), and its characteristic path length L is close to the theoretical minimum shown by a random graph (L ~ Lrandom).
        • The reason a graph can have small L despite being highly clustered is that a few nodes connecting distant clusters are sufficient to lower L.
        • Because C changes little as small-worldliness develops, it follows that small-worldliness is a global graph property that cannot be found by studying local graph properties.
small world networks17
Small-world networks

A network or order (0

These can be calculated from mathematical simulations and yield the following behavior (Watts and Strogatz):

slide19
Small-world networks

Part of the reason for the interest in the results of Watts and Strogatz is that small-world networks seem to be good models for a wide variety of physical situations. They showed that the power grid for the western U.S. (nodes are power stations, and there is an edge joining two nodes if the power stations are joined by high-voltage transmission lines), the neural network of a nematode worm (nodes are neurons and there is an edge joining two nodes if the neurons are joined by a synapse or gap junction), and the Internet Movie Database (nodes are actors and there is an edge joining two nodes if the actors have appeared in the same movie) all have the characteristics (high clustering coefficient but low characteristic path length) of small-world networks.

Intuitively, one can see why small-world networks might provide a good model for a number of situations. For example, people tend to form tight clusters of friends and colleagues (a regular network), but then one person might move from New York to Los Angeles, say, introducing a random edge. The results of Watts and Strogatz then provide an explanation for the empirically observed phenomenon that there often seem to be surprisingly short connections between unrelated people (e.g., you meet a complete stranger on an airplane and soon discover that your sister's best friend went to college with his boss's wife).

small world example metabolism
Small world example: metabolism.
  • Wagner and Fell (2001) modeled the known reactions of 287 substrates that represent the central routes of energy metabolism and small-molecule building block synthesis in E. coli. This included metabolic sub-pathways such as:
  • glycolysis
  • pentose phosphate and Entner-Doudoro pathways
  • glycogen metabolism
  • acetate production
  • glyoxalate and anaplerotic reactions
  • tricarboxylic acid cycle
  • oxidative phosphorylation
  • amino acid and polyamine biosynthesis
  • nucleotide and nucleoside biosynthesis
  • folate synthesis and 1-carbon metabolism
  • glycerol 3-phosphate and membrane lipids
  • riboflavin
  • coenzyme A
  • NAD(P)
  • porphyrins, haem and sirohaem
  • lipopolysaccharides and murein
  • pyrophosphate metabolism
  • transport reactions
  • glycerol 3-phosphateproduction
  • isoprenoid biosynthesis and quinone biosynthesis
  • These sub-pathways form a network because some compounds are part of more than one pathway and because most of them include common components such as ATP and NADP.
  • Thegraphs on the left show that considering either reactants or substrates, the clustering coefficient C>>Crandom, and the length coefficient L is near that of Lrandom, characteristics of a small world system.

random

Wagner A, Fell D (2001) The small world inside large metabolic networks. Proc. R. Soc. London Ser. B 268, 1803-1810.

slide21
Scale-free Networks

Using a Web crawler, physicist Albert-Laszlo Barabasi and his colleagues at the University of Notre Dame in Indiana in 1998 mapped the connectedness of the Web. They were surprised to find that the structure of the Web didn't conform to the then-accepted model of random connectivity. Instead, their experiment yielded a connectivity map that they christened "scale-free."

  • Often small-world networks are also scale-free.
  • Ina scale-free network the characteristic clustering is maintained even as the networks themselves grow arbitrarily large.
slide22
Scale-free Networks
  • In any real network some nodes are more highly connected than others.
    • P(k) is the proportion of nodes that have k-links.
    • For large, random graphs only a few nodes have a very small k and only a few have a very large k, leading to a bell-shaped Poisson distribution:

Scale-free networks fall off more slowly and are more highly skewed than random ones due to the combination of small-world local highly connected neighborhoods and more 'shortcuts' than would be expected by chance.

Scale-free networks are governed by a power law of the form:

P(k) ~ k-

slide23
Scale-free Networks

Because of the P(k) ~ k-power law relationship, a log-log plot of P(k) versus k gives a straight line of slope - : 

Some networks, especially small-world networks of modest size do not follow a power law, but are exponential. This point can be significant when trying to understand the rules that underlie the network.

slide24
Comparing Random and Scale-Free DistributionIn the random network (right), the five nodes with the most links (in red) are connected to only 27% of all nodes (green). In the scale-free network (left), the five most connected nodes (red), often called hubs, are connected to 60% of all nodes (green).
slide25
Scale-free Networks

Before discovering scale-free networks, Barabasi and his team had been doing work that modeled surfaces in terms of fractals, which are also scale-free.

Their discoveries about networks have been found to have implications well beyond the Internet; the notion of scale-free networks has turned the study of a number of fields upside down. Scale-free networks have been used to explain behaviors as diverse as those of power grids, the stock market and cancerous cells, as well as the dispersal of sexually transmitted diseases.

slide26
Scale-free Networks

Put simply, the nodes of a scale-free network aren't randomly or evenly connected. Scale-free networks include many "very connected" nodes, hubs of connectivity that shape the way the network operates. The ratio of very connected nodes to the number of nodes in the rest of the network remains constant as the network changes in size.

In contrast, random connectivity distributions—the kinds of models used to study networks like the Internet before Barabasi and his team made their observation—predicted that there would be no well-connected nodes, or that there would be so few that they would be statistically insignificant. Although not all nodes in that kind of network would be connected to the same degree, most would have a number of connections hovering around a small, average value. Also, as a randomly distributed network grows, the relative number of very connected nodes decreases.

slide27
Scale-free Networks

The ramifications of this difference between the two types of networks are significant, but it's worth pointing out that both scale-free and randomly distributed networks can be what are called "small world" networks. That means it doesn't take many hops to get from one node to another—the science behind the notion that there are only six degrees of separation between any two people in the world. So, in both scale-free and randomly distributed networks, with or without very connected nodes, it may not take many hops for a node to make a connection with another node. There's a good chance, though, that in a scale-free network, many transactions would be funneled through one of the well-connected hub nodes - one like Google’s Web portal.

Because of these differences, the two types of networks behave differently as they break down. The connectedness of a randomly distributed network decays steadily as nodes fail, slowly breaking into smaller, separate domains that are unable to communicate.

slide28
Scale-free Networks

Resists Random Failure

Scale-free networks, on the other hand, may show almost no degradation as random nodes fail. With their very connected nodes, which are statistically unlikely to fail under random conditions, connectivity in the network is maintained. It takes quite a lot of random failure before the hubs are wiped out, and only then does the network stop working. (Of course, there's always the possibility that the very connected nodes would be the first to go.)

In a targeted attack, in which failures aren't random but are the result of mischief, or worse, directed at hubs, the scale-free network fails catastrophically. Take out the very connected nodes, and the whole network stops functioning. In these days of concern about cyber attacks on the critical infrastructure, whether the nodes on the network in question are randomly distributed or are scale-free makes a big difference.

slide29
Scale-free Networks

Epidemiologists are also pondering the significance of scale-free connectivity.

Until now, it has been accepted that stopping sexually transmitted diseases requires reaching or immunizing a large proportion of the population; most contacts will be safe, and the disease will no longer spread. But if societies of people include the very connected individuals of scale-free networks—individuals who have sex lives that are quantitatively different from those of their peers—then health offensives will fail unless they target these individuals. These individuals will propagate the disease no matter how many of their more subdued neighbors are immunized.

Now consider the following: Geographic connectivity of Internet nodes is scale-free, the number of links on Web pages is scale-free, Web users belong to interest groups that are connected in a scale-free way, and e-mails propagate in a scale-free way. Barabasi's model of the Internet tells us that stopping a computer virus from spreading requires that we focus on protecting the hubs.

slide31
Scale Free Network

•Hubs, highly connected nodes, bring together different parts of the network

• Rubustness: Removing random nodes have little effect

• Low attack resistance: Removing a hub is lethal (PPI: centrality-lethality rule).

Random Network

• No hubs

• Low robustness

• Low attack resistance

slide34
14-3-3 subtypes (paralogs)

Schematic representation

of co-immunoprecipitation studies

performed with anti- MARK

(microtubule affinity-regulating

kinase) antibodies. The strength of

the interactions is indicated by the

thickness of the arrows (after (2) .

slide36
Preferential attachment
  • Hub protein characteristics:
  • Multiple binding sites
  • Promiscuous binding
  • Non-specific binding

…connect preferentially to a hub

hub proteins in yeast
Hub proteins in yeast

“[..] network analysis suggests that the centrality-lethality rule is unrelated to the network architecture, but is explained by the simple fact that hubs have large numbers of PPIs, therefore high probabilities of engaging in essential PPIs”

He X, Zhang J (2006) Why do hubs tend to be essential in protein networks? PLoS Genet 2(6):e88

slide39
Network motifs
  • Different Motifs indifferent processes
  • • More interconnectedmotifs are moreconserved
network dynamics
Network Dynamics
  • Party hubs: always the samepartners (same time and space)
  • •Date hubs: different partners indifferent conditions (different time and/or space)
  • • Difference is important for inter-processcommunication

Date hubs: large binding surfaces / Party hubs: small bindingsurfaces

slide44
A network example from Meta-genomics Ecogenomics – soil ecosystems

A virtual network where species are nodes and (groups of) chemical compounds are exchanged between the nodes

slide47
Preferential attachment in biodegradation networks

New degradable compounds are observed to attach preferentially to hubs close to (or in) the Central Metabolism

Valencia and co-workers

slide48
The “Matchmaker” 14-3-3 family
  • Massively interacting protein family (the PPI champions) by means of various binding modes
  • Involved in many essential cell processes
  • Occurs throughout kingdom of life
  • Various numbers of isoforms in different organisms (7 in human)
slide51
Janus-faced character of 14-3-3s

Identified(co)-targets fall in opposing classes.

Clear color:actin growth, pro-apoptotic, stimulation oftranscription, nuclear import, neuron development.

Hatched: opposing functions. 100% = 56proteins(De Boer & Jimenez, unpubl. data.).

slide52
Targets of 14-3-3 proteins implicated intumor development.

Arrows indicate positiveeffects while sticks represent inhibitory effects. Targets involved in primaryapoptosis and cell cycle control are not shown dueto space limitations.

slide53
Role of 14-3-3 proteins in apoptosis

14-3-3 proteins inhibit apoptosis throughmultiple mechanisms: sequestration and control ofsubcellular localization of phosphorylated and nonphosphorylatedpro- and anti-apoptotic proteins.

What is the role of the subtypes? Modularity?

slide54
14-3-3 subtypes (paralogs)

Different subtypes display different binding modes, reflecting pronounced divergent evolution after duplication

14-3-3- subtypes ,, and 

Schematic representation

of co-immunoprecipitation studies

performed with anti- MARK

(microtubule affinity-regulating

kinase) antibodies. The strength of

the interactions is indicated by the

thickness of the arrows.

slide55
Protein Function Prediction

How can we predict protein function, and, more specifically, protein interaction partners (PPI)

From network behaviour to cellular component-based function prediction

We do not know the function of many cellular components

slide56
Protein Function Prediction

The deluge of genomic information begs the following question: what do all these genes do?

Many genes are not annotated, and many more are partially or erroneously annotated. Given a genome which is partially annotated at best, how do we fill in the blanks?

Of each sequenced genome, 20%-50% of the functions of proteins encoded by the genomes remains unknown!

slide57
Protein Function Prediction

We are faced with the problem of predicting protein function from sequence, expression, interaction and structural data.

For all these reasons and many more, automated protein function prediction is rapidly gaining interest among bioinformaticians and computational biologists

outline of protein function interaction prediction methods and databases
Outline of protein function/interaction prediction methods and databases

These techniques are more designed for general function prediction than for PPI prediction

  • Sequence-based function prediction
  • Structure-based function prediction
    • Sequence-structure comparison
    • Structure-structure comparison
  • Motif-based function prediction
  • Phylogenetic profile analysis
  • Protein interaction prediction and databases
  • Functional inference at systems level
classes of function prediction methods
Classes of function prediction methods
  • Sequence based approaches
    • protein A has function X, and protein B is a homolog (ortholog) of protein A; Hence B has function X
  • Structure-based approaches
    • protein A has structure X, and X has so-so structural features; Hence A’s function sites are ….
  • Motif-based approaches
    • a group of genes have function X and they all have motif Y; protein A has motif Y; Hence protein A’s function might be related to X
  • Function prediction based on “guilt-by-association”
    • gene A has function X and gene B is often “associated” with gene A, B might have function related to X
sequence based function prediction homology searching
Sequence-based function prediction Homology searching
  • Sequence comparison is a powerful tool for detection of homologous genes but limited to genomes that are not too distant away

uery: 2   LSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKASEDL 61           LSD +   V  +W K+       G + L R+   +P+T   F  +      D    S ++Sbjct: 3   LSDKDKAAVRALWSKIGKSSDAIGNDALSRMIVVYPQTKIYFSHWP-----DVTPGSPNI 57Query: 62  KKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSKHPG 121           K HG  V+  +   + K    +  +  L++ HA K ++     + ++ CI+ V+ +  PSbjct: 58  KAHGKKVMGGIALAVSKIDDLKTGLMELSEQHAYKLRVDPSNFKILNHCILVVISTMFPK 117Query: 122 DFGADAQGAMNKALELFRKDMASNYK 147           +F  +A  +++K L      +A  Y+Sbjct: 118 EFTPEAHVSLDKFLSGVALALAERYR 143

structure based function prediction
Structure-based function prediction
  • Structure-based methods could possibly detect remote homologues that are not detectable by sequence-based method
    • using structural information in addition to sequence information
    • protein threading (sequence-structure alignment) is a popular method

Structure-based methods could provide more than just “homology” information

slide62
Structure-based: Threading

Template sequence

+

Compatibility score

Query sequence

Template structure

slide63
Threading

Template sequence

+

Compatibility score

Query sequence

Template structure

slide64
Structure-based function prediction

Threading

  • Scoring function for measuring to what extend query sequence fits into template structure
  • For scoring we have to map an amino acid (query sequence) onto a local environment (template structure)
  • We can use structural features for this:
    • Secondary structure
    • Is environment inside or outside? – Residue accessible surface area (ASA)
    • Polarity of environment
  • The best (highest scoring) “thread” through the structure gives a so-called structural alignment, this looks exactly the same as a sequence alignment but is based on structure.
slide65
Threading – inverse foldingMap sequence to structural environments

Query

Template

?

What is the optimal thread for each local environment?

Find the best compromise over all environments

environment

  • Secondary structure
  • ASA
  • Polarity of environment

C

N

hydrophobic

hydrophilic

slide66
Fold recognition by threading

Fold 1

Fold 2

Fold 3

Fold N

Query sequence

What is the most compatible structure?

Compatibility scores

structure based function prediction67
Structure-based function prediction
  • SCOP (http://scop.berkeley.edu/) is a protein structure classification database where proteins are grouped into a hierarchy of families, superfamilies, folds and classes, based on their structural and functional similarities
  • SCOP is one of the most widely-used protein structure archives that serves as a standard-of-truth for evaluating prediction methods
structure based function prediction68
Structure-based function prediction
  • SCOP hierarchy – the top level: 11 classes
structure based function prediction69
Structure-based function prediction

All-alpha protein

membrane protein

Alpha-beta protein

Coiled-coil protein

All-beta protein

structure based function prediction70
Structure-based function prediction
  • SCOP hierarchy – the second level: 800 folds
structure based function prediction71
Structure-based function prediction
  • SCOP hierarchy - third level: 1294 superfamilies
structure based function prediction72
Structure-based function prediction
  • SCOP hierarchy - third level: 2327 families
structure based function prediction73
Structure-based function prediction
  • Using sequence-structure alignment method, one can predict a protein belongs to a
    • SCOP familiy, superfamily or fold
  • Proteins predicted to be in the same SCOP family are orthologous
  • Proteins predicted to be in the same SCOPE superfamily are homologous
  • Proteins predicted to be in the same SCOP fold are structurally analogous

folds

superfamilies

families

structure based function prediction74
Structure-based function prediction
  • Prediction of ligand binding sites
    • For ~85% of ligand-binding proteins, the largest largest cleft is the ligand-binding site
    • For additional ~10% of ligand-binding proteins, the second largest cleft is the ligand-binding site
structure based function prediction75
Structure-based function prediction
  • Prediction of macromolecular binding site
    • there is a strong correlation between macromolecular binding site (with protein, DNA and RNA) and disordered protein regions
    • disordered regions in a protein sequence can be predicted using computational methods

http://www.pondr.com/

motif based function prediction
Motif-based function prediction
  • Prediction of protein functions based on identified sequence motifs
  • PROSITE contains patterns specific for more than a thousand protein families.
  • ScanPROSITE -- it allows to scan a protein sequence for occurrence of patterns and profiles stored in PROSITE
motif based function prediction77
Motif-based function prediction
  • Search PROSITE using ScanPROSITE
  • The sequence has ASN_GLYCOSYLATION N-glycosylation site: 242 - 245 NETL

MSEGSDNNGDPQQQGAEGEAVGENKMKSRLRKGALKKKNVFNVKDHCFIARFFKQPTFCSHCKDFICGYQSGYAWMGFGKQGFQCQVCSYVVHKRCHEYVTFICPGKDKG NETLIDSDSPKTQH ……..

regular expressions
Regular expressions

Alignment

ADLGAVFALCDRYFQ

SDVGPRSCFCERFYQ

ADLGRTQNRCDRYYQ

ADIGQPHSLCERYFQ

Regular expression

[AS]-D-[IVL]-G-x4-{PG}-C-[DE]-R-[FY]2-Q

{PG} = not (P or G)

For short sequence stretches, regular expressions are often more suitable to describe the information than alignments (or profiles)

regular expressions79
Regular expressions

Regular expression No. of exact matches in DB

D-A-V-I-D 71

D-A-V-I-[DENQ] 252

[DENQ]-A-V-I-[DENQ] 925

[DENQ]-A-[VLI]-I-[DENQ] 2739

[DENQ]-[AG]-[VLI]2-[DENQ] 51506

D-A-V-E 1088

phylogenetic profile analysis
Phylogenetic profile analysis
  • Function prediction of genes based on “guilt-by-association” – a non-homologous approach
  • The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every sequenced genome
  • Because proteins that participate in a common structural complex or metabolic pathway are likely to co-evolve, the phylogenetic profiles of such proteins are often ``similar''
phylogenetic profile analysis81
Phylogenetic profile analysis
  • Evolution suppresses unnecessary proteins
  • Once a member of an interaction is lost, the partner is likely to be lost as well
phylogenetic profile analysis82
Phylogenetic profile analysis
  • Phylogenetic profile (against N genomes)
    • For each gene X in a target genome (e.g., E coli), build a phylogenetic profile as follows
    • If gene X has a homolog in genome #i, the ith bit of X’s phylogenetic profile is “1”, otherwise it is “0”
phylogenetic profile analysis83
Phylogenetic profile analysis
  • Example – phylogenetic profiles based on 60 genomes

genome

gene

orf1034:1110110110010111110100010100000000111100011111110110111010101

orf1036:1011110001000001010000010010000000010111101110011011010000101

orf1037:1101100110000001110010000111111001101111101011101111000010100

orf1038:1110100110010010110010011100000101110101101111111111110000101

orf1039:1111111111111111111111111111111111111111101111111111111111101

orf104: 1000101000000000000000101000000000110000000000000100101000100

orf1040:1110111111111101111101111100000111111100111111110110111111101

orf1041:1111111111111111110111111111111101111111101111111111111111101

orf1042:1110100101010010010110000100001001111110111110101101100010101

orf1043:1110100110010000010100111100100001111110101111011101000010101

orf1044:1111100111110010010111010111111001111111111111101101100010101

orf1045:1111110110110011111111111111111101111111101111111111110010101

orf1046:0101100000010001011000000111110000010100000001010010100000000

orf1047:0000000000000001000010000001000100000000000000010000000000000

orf105: 0110110110100010111101101010111001101100101111100010000010001

orf1054:0100100110000001100001000100000000100100100001000100100000000

By correlating the rows (open reading frames (ORF) or genes) you find out about joint presence or absence of genes: this is a signal for a functional connection

Genes with similar phylogenetic profiles have related functions or functionally linked – D Eisenberg and colleagues (1999)

phylogenetic profile analysis84
Phylogenetic profile analysis
  • Phylogenetic profiles contain great amount of functional information
  • Phlylogenetic profile analysis can be used to distinguish orthologous genes from paralogous genes
  • Subcellular localization: 361 yeast nucleus-encoded mitochondrial proteins are identified at 50% accuracy with 58% coverage through phylogenetic profile analysis
  • Functional complementarity: By examining inverse phylogenetic profiles, one can find functionally complementary genes that have evolved through one of several mechanisms of convergent evolution.
prediction of protein protein interactions rosetta stone
Prediction of protein-protein interactionsRosetta stone
  • Gene fusion is the an effective method for prediction of protein-protein interactions
    • If proteins A and B are homologous to two domains of a protein C, A and B are predicted to have interaction

A

B

Two-domain protein

C

Though gene-fusion has low prediction coverage, it false-positive rate is low (high specificity)

slide86
Gene (domain) fusion example
  • Vertebrates have a multi-enzyme protein (GARs-AIRs-GARt) comprising the enzymes GAR synthetase (GARs), AIR synthetase (AIRs), and GAR transformylase (GARt).
  • In insects, the polypeptide appears as GARs-(AIRs)2-GARt.
  • In yeast, GARs-AIRs is encoded separately from GARt
  • In bacteria each domain is encoded separately (Henikoff et al., 1997).

GAR: glycinamide ribonucleotide

AIR: aminoimidazole ribonucleotide

protein interaction prediction through co evolution
Protein interaction prediction through co-evolution
  • FALSE NEGATIVES:
  • need many organisms
  • relies on known orthologous relationships
  • FALSE POSITIVES
  • Phylogenetic signals at the organsism level
  • Functional interaction may not mean physical interaction
protein interaction database
Protein interaction database
  • There are numerous databases of protein-protein interactions
  • DIP is a popular protein-protein interaction database

The DIP database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions.

protein interaction databases
Protein interaction databases

BIND - Biomolecular Interaction Network Database

DIP - Database of Interacting Proteins

PIM – Hybrigenics

PathCalling Yeast Interaction Database

MINT - a Molecular Interactions Database

GRID - The General Repository for Interaction Datasets

InterPreTS - protein interaction prediction through tertiary structure

STRING - predicted functional associations among genes/proteins

Mammalian protein-protein interaction database (PPI)

InterDom - database of putative interacting protein domains

FusionDB - database of bacterial and archaeal gene fusion events

IntAct Project

The Human Protein Interaction Database (HPID)

ADVICE - Automated Detection and Validation of Interaction by Co-evolution

InterWeaver - protein interaction reports with online evidence

PathBLAST - alignment of protein interaction networks

ClusPro - a fully automated algorithm for protein-protein docking

HPRD - Human Protein Reference Database

slide91
Network of protein interactions and predicted functional links involving silencing information regulator (SIR) proteins. Filled circles represent proteins of known function; open circles represent proteins of unknown function, represented only by their Saccharomyces genome sequence numbers ( http://genome-www.stanford.edu/Saccharomyces). Solid lines show experimentally determined interactions, as summarized in the Database of Interacting Proteins19 (http://dip.doe-mbi.ucla.edu). Dashed lines show functional links predicted by the Rosetta Stone method12. Dotted lines show functional links predicted by phylogenetic profiles16. Some predicted links are omitted for clarity.
slide92
Network of predicted functional linkages involving the yeast prion protein20 Sup35. The dashed line shows the only experimentally determined interaction. The other functional links were calculated from genome and expression data11 by a combination of methods, including phylogenetic profiles, Rosetta stone linkages and mRNA expression. Linkages predicted by more than one method, and hence particularly reliable, are shown by heavy lines. Adapted from ref. 11.
string predicted functional associations among genes proteins
STRING - predicted functional associations among genes/proteins
  • STRING is a database of predicted functional associations among genes/proteins.
  • Genes of similar function tend to be maintained in close neighborhood, tend to be present or absent together, i.e. to have the same phylogenetic occurrence, and can sometimes be found fused into a single gene encoding a combined polypeptide.
  • STRING integrates this information from as many genomes as possible to predict functional links between proteins.

Berend Snel en Martijn Huynen (RUN) and the group of Peer Bork (EMBL, Heidelberg)

string predicted functional associations among genes proteins94
STRING - predicted functional associations among genes/proteins

STRING is a database of known and predicted protein-protein interactions.The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources:

  • Genomic Context (Synteny)
  • High-throughput Experiments 
  • (Conserved) Co-expression 
  • Previous Knowledge

STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently contains 736429 proteins in 179 species

string predicted functional associations among genes proteins95
STRING - predicted functional associations among genes/proteins

Conserved Neighborhood

This view shows runs of genes that occur repeatedly in close neighborhood in (prokaryotic) genomes. Genes located together in a run are linked with a black line (maximum allowed intergenic distance is 300 bp). Note that if there are multiple runs for a given species, these are separated by white space. If there are other genes in the run that are below the current score threshold, they are drawn as small white triangles. Gene fusion occurences are also drawn, but only if they are present in a run.

string predicted functional associations among genes proteins96
STRING - predicted functional associations among genes/proteins
  • Gene clusters in a genomic region are likely to interact
  • co-ordinated expression
  • co-ordinated gene gains/losses
functional inference at systems level
Functional inference at systems level
  • Function prediction of individual genes could be made in the context of biological pathways/networks
  • Example – phoB is predicted to be a transcription regulator and it regulates all the genes in the pho-regulon (a group of co-regulated operons); and within this regulon, gene A is interacting with gene B, etc.

phoB

functional inference at systems level98
Functional inference at systems level
  • KEGG is database of biological pathways and networks
consequence of evolution
Consequence of evolution
  • Notion of comparative analysis (Darwin)
  • What you know about one species might be transferable to another, for example from mouse to human
  • Provides a framework to do multi-level large-scale analysis of the genomics data plethora
functional inference at systems level102
Functional inference at systems level
  • By doing homologous search, one can map a known biological pathway in one organism to another one; hence predict gene functions in the context of biological pathways/networks
  • Mapping networks of multiple organisms and looking at the evolutionary conservation allows the delineation of modules and essential parts of the networks
slide103
Human

Yeast

We need to be able to do automatic pathway comparison (pathway alignment)

This pathway diagram shows a comparison of pathways in (left) Homo sapiens(human) and (right)Saccharomycescerevisiae(baker’s yeast). Changes in controlling enzymes (square boxes in red) and the pathway itself have occurred (yeast has one altered (‘overtaking’) path in the graph)

wrapping up
Wrapping up
  • Prim’s algorithm for MST and derived clustering protocol
  • Regular, random, small-world and scale-free networks
  • Evolution of topology and dynamics of biological networks, e.g. duplication, preferential attachment, party/date hub proteins,..
  • We have seen a number of ways to infer a putative function for a protein sequence (e.g. guilt by association): PPI prediction is a special case and you should know the related methods
  • Phylogenetic signal to predict PPI (co-evolution)
  • To gain confidence, it is important to combine as many different prediction protocols as possible (the STRING server is an example of this)
  • Comparing and overlaying various networks (e.g. regulation, signalling, metabolic, PPI) and studying conservation at these network levels is one of the current grand challenges, and will be crucially important for a systems–based approach to (intra)cellular behaviour.
ad