Amaze protein function and biochemical processes
Download
1 / 91

Graph-based analysis of biochemical networks - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

aMAZE - Protein Function and Biochemical Processes. Graph-based analysis of biochemical networks. Contents. Mapping metabolic networks onto a graph Taversal rules for metabolic graphs Path finding Path finding in weighted graphs Pathway reconstruction by reaction clustering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Graph-based analysis of biochemical networks' - idana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Amaze protein function and biochemical processes

aMAZE - Protein Function and Biochemical Processes

Graph-based analysis of biochemical networks

Jacques van [email protected]


Contents
Contents

  • Mapping metabolic networks onto a graph

  • Taversal rules for metabolic graphs

  • Path finding

  • Path finding in weighted graphs

  • Pathway reconstruction by reaction clustering

  • From gene expression data to pathways

  • Recurrent modules


Graph based analysis of biochemical networks

Graph-based analysis of biochemical networks

Mapping metabolic networks onto graphs

Jacques van [email protected]


Metabolic network
Metabolic network

L-Homoserine

SuccinylSCoA

AcetlyCoA

2.3.1.46

2.3.1.31

HSCoA

CoA

Alpha-succinyl-L-Homoserine

L-Cysteine

E.coli

S.cerevisiae

O-acetyl-homoserine

4.2.99.9

Succinate

Cystathionine

H2O

Sulfide

4.4.1.8

4.2.99.10

NH4+

Pyruvate

Homocysteine

5-MethylTHF

2.1.1.14

THF

L-Methionine


One node per compound
One node per compound

L-Homoserine

SuccinylSCoA

AcetlyCoA

2.3.1.46

2.3.1.46

2.3.1.46

2.3.1.46

HSCoA

CoA

Alpha-succinyl-L-Homoserine

L-Cysteine

4.2.99.9

O-acetyl-homoserine

4.2.99.9

4.2.99.9

4.2.99.9

Succinate

Cystathionine

H2O

Sulfide

NH4+

Pyruvate

Homocysteine

  • vertices = compounds

  • arcs = reactions

  • problem: no representation of cross-point reactions

5-MethylTHF

THF

L-Methionine


One node per reaction
One node per reaction

2.3.1.46

2.3.1.31

Alpha-succinyl-L-Homoserine

O-acetyl-homoserine

4.2.99.9

Cystathionine

4.4.1.8

4.2.99.10

Homocysteine

Homocysteine

  • vertices = reactions

  • arcs = intermediate compounds

  • problem: no representation of cross-point compounds

2.1.1.14


One node per compound and per reaction
One node per compound and per reaction

L-Homoserine

SuccinylSCoA

AcetlyCoA

2.3.1.46

2.3.1.31

HSCoA

CoA

Alpha-succinyl-L-Homoserine

L-Cysteine

O-acetyl-homoserine

4.2.99.9

Succinate

Cystathionine

H2O

Sulfide

4.4.1.8

4.2.99.10

NH4+

  • 2 types of vertices

    • compounds and reactions

  • arcs

    • from substrate to reaction

    • from reaction to product

    • arc labels can be used to represent stoichiometry

Pyruvate

Homocysteine

5-MethylTHF

2.1.1.14

THF

L-Methionine


Reactions and compounds directed bipartite graph

a bipartite graph is a graph whose vertex-set V can be partitioned into two subsets U and W, such that each edge of G has one endpoint in U and one endpoint in W.

arcs never go from compound to compound

arcs never go from reaction to reaction

5,871 compounds

5,223reactions

Reactions and compounds: directed bipartite graph

21,194arcs


Extending the graph to full biochemical networks
Extending the graph to full biochemical networks

  • The concept can be extended to include additional types of vertices :

    • biochemical entities : compounds, genes, proteins, …

    • biochemical interactions : reaction, catalysis, transcription, regulation, translocation, transport catalysis…

  • This allows to represent metabolism, regulation, transport, signal transduction, compartments, …

  • Warning : with this extension, the graph is not bipartite anymore, because some interactions have other interactions as output (e.g. a catalysis acts on a reaction)

  • van Helden et al. (2000) Biol Chem, 381(9-10), 921-35.

  • van Helden et al. (2001) Briefings in Bioinformatics, 2(1), 81-93.

  • van Helden et al. (2002) In Bioinformatics and Genome Analysis. Springer-Verlag, Berlin Heidelberg, Vol. 38.


Graph based analysis of metabolic networks

Graph-based analysis of metabolic networks

Traversal rules for metabolic graphs

Jacques van [email protected]


Ubiquitous compounds
Ubiquitous compounds

Reactions

L-Aspartic Semialdehyde

dihydrodipicolinic acid

4.2.1.52

Pyruvate

H2O

Sucinyl diaminopimelate

succinate

3.5.1.18

H2O

LL-diaminopimelic acid

Invalid pathway

L-Aspartic Semialdehyde

LL-diaminopimelic acid

4.2.1.52

3.5.1.18

H2O






Invalid intermediates
Invalid intermediates

  • Where to set the limit ?

    • Seems obvious for H2O (1615), NADH (569), ...

    • What about ATP (435) ?

    • And pyruvate ?

    • And NH3 ?

  • Depends on the reaction/pathway considered

    • e.g. ATP is valid intermediate in nucleotide biosynthesis

  • Depends on the atoms being transferred during the reaction

    • e.g. NADH gives one proton

  • Depends on the focus of the question

    • e.g. analysis of energy metabolism ATP, NAD will matter


Ubiquitous compounds1
Ubiquitous compounds

  • Jeong et al. (Nature2000; 407: 651-654)

    • Calculate network diameter, i.e. average length of shortest path between two compounds

    • Show that when ubiquitous compounds ("hubs" in their terminology) are removed, diameter increases.

    • Compared the metabolic network diameter between different organisms.

    • "Surprising" result: the network diameter does not depend on the number of enzymes found in the organism.

    • But: for this comparison, all compounds were considered, including H2O.


Direct traversal of reversible reactions
Direct traversal of reversible reactions

Reaction

L-Aspartic Semialdehyde

dihydrodipicolinic acid

4.2.1.52

Pyruvate

H2O

Valid pathways

4.2.1.52

L-Aspartic Semialdehyde

dihydrodipicolinic acid

4.2.1.52

dihydrodipicolinic acid

L-Aspartic Semialdehyde

Invalid pathway

L-Aspartic Semialdehyde

4.2.1.52

Pyruvate


Mutual exclusion of reverse reactions
Mutual exclusion of reverse reactions

Reactions

L-Aspartic Semialdehyde

dihydrodipicolinic acid

4.2.1.52

Pyruvate

H2O

dihydrodipicolinic acid

L-Aspartic Semialdehyde

4.2.1.52

reverse

H2O

Pyruvate

Invalid pathway

4.2.1.52

reverse

L-Aspartic Semialdehyde

dihydrodipicolinic acid

4.2.1.52

Pyruvate


Traversal of reversible reactions
Traversal of reversible reactions

  • Fell& Wagner (Nat Biotechnol2000; 18, 1121-2)

    • Select a sub-network (energy metabolism and small molecule biosynthesis in E.coli).

    • Discard ubiquitous compounds.

    • Identify the "center" of the network : glutamate, followed by pyruvate.

    • But: reactions can be traversed from substrate to substrate or from product to product.

  • Jeong et al. (Nature2000; 407: 651-654)

    • Calculate network diameter.

    • But: reactions can be traversed from substrate to substrate or from product to product.



Applications of path finding to biochemical netwo rks
Applications of path finding to biochemical networks

  • metabolic pathways from compound A to compound B (2-ends path finding)

  • genes regulated by a membrane receptor via a signal transduction pathway (1-end path-finding)

  • proteins and compounds regulating directly or indirectly the expression of a given gene (1-end path finding, reverse)

  • feed-back loops (cycle finding)

  • functional distance between two enzymes, in terms of the minimal number of steps between the reactions they catalyze


A graph of compou nds and reactions
A graph of compounds and reactions

Reactions from KEGG

  • Compound nodes

  • 10,166 compounds(only 4302 used by one reaction)

  • Reaction nodes

  • 5,283 reactions

  • Arcs

  • 10,685 substrate  reaction (7,297 non-trivial)

  • 10,621 reaction  product(6,828 non-trivial)


Metabolic pathways as subgraphs

Escherichia coli

4219 Genes (Blattner)

967 enzymes (Swissprot)

159 pathways (EcoCyc)

Metabolic Pathways as subgraphs


Functional distance between enzymes
Functional distance between enzymes

  • The length of the shortest path between two reactions can be considered as a measure of their functional distance.

  • By extension, one can estimate the functional distance between two enzymes as the length of the shortest path between the ctalayzed reactions.

  • Example of application: interpretation of pairs of fused genes

    • Two enzymatic functions can be carried by a single gene in a genome, and by two separated genes in another genomes, as the result of a gene fusion event

    • Are such fusion events preferentially observed between functionally related enzymes ?


Shortest path finding with gene fusion pairs
Shortest path finding with gene fusion pairs

enzyme A

enzyme B

  • Fusion pairs

    • Tsoka and Ouzounis (Nat Genet2000; 26: 141-2)

  • Shortest path analysis

    • van Helden et al. (2002) In Bioinformatics and Genome Analysis. Springer-Verlag, Berlin Heidelberg, Vol. 38.

reactions

compounds

functional distancebetween enzymes

shortest path finding

Fusion pairs

Random pairs


Pathway enumeration
Pathway enumeration

source compound

target compound

  • Kuffner et al. (Bioinformatics 2000; 16: 825-836).

  • All possible paths from glucose to pyruvate, with maximal length 9  500,000 possible paths.

  • Adding constraints

    • Selecting "complete" pathways, i.e. where all side reactants are ubiquitous

    • Constraint on pathway width

      • Width 2  541 pathways

      • Width 1  170 pathways

reactions

compounds

potential metabolic pathways

path finding


Scoring pathways with gene expression data

select reactions (for each pathway separately)

set of reactions

genesenzymes

identification of enzymes

enzyme-coding genes

gene expressiondata

scoring of gene cluster

(covariance of the response)

most probably relevant pathways

Scoring pathways with gene expression data

source compound

target compound

reactions

compounds

potential metabolic pathways

path finding


Scoring pathways with gene expression data1

random

control (glycolysis)

found

Scoring pathways with gene expression data

pathway score distribution

Zien, A., Kuffner, et al. (2000). Ismb8, 407-17.


Path finding summary
Path finding - summary

  • Metabolic pathways are organism-dependent

  • Shortest path is generally not the most relevant.

  • Simple path enumeration returns innumerable false positives.

  • Adding consistency rules (complete pathways) reduces the number of returned pathways.

  • Pathway scoring allows to select the most relevant pathways for a given organism.

  • Requirements

    • Gene expression data

    • Specification of the source and target compounds


Graph based analysis of biochemical networks2

Graph-based analysis of biochemical networks

Pathway building by reaction clustering

Jacques van [email protected]


Reconstructing a pathway from a subset of reactions
Reconstructing a pathway from a subset of reactions

  • Input:

    • a set of reactions (the seed reactions)

  • Output:

    • a metabolic pathway including

      • the seed reactions, together with their substrates and products

      • optionally, some additional reactions, intercalated to improve the pathway connectivity

    • the pathway can either be connected, or contain several unconnected components


Seed nodes
Seed nodes

Compound

Reaction

Seed Reaction


Linking seed nodes
Linking seed nodes

Compound

Reaction

Seed Reaction

Direct link


Enhance linking b y intercalating reactions

Compound

Reaction

Seed Reaction

Direct link

Intercalated reaction

Enhance linking by intercalating reactions



Validation of the method
Validation of the method

  • Take a known pathway (e.g. Lysine biosynthesis in Escherichia coli: 9 reactions).

  • Provide the program with a subset of reactions.

  • See if the program is able to reconstruct the whole pathway on the basis of this subset.


Validation of the method1
Validation of the method

  • Take a set of experimentally characterized pathways, and for each one

    • Select a subset of enzymes

    • Use the reactions catalysed by these enzymes as seed nodes

    • Extract the subgraph

    • Compare with known pathway


Lysine biosynthesis in e coli
Lysine biosynthesis in E.coli

Aspartate biosynthesis

L-Aspartate

ATP

aspartate kinase III

lysC

2.7.2.4

ADP

L-aspartyl-4-P

NADPH; H+

aspartate semialdehyde deshydrogenase

asd

Methionine biosynthesis

1.2.1.11

NADP+; Pi

L-aspartic semialdehyde

Threnonine biosynthesis

pyruvate

dihydrodipicolinate synthase

dapA

4.2.1.52

2 H2O

dihydropicolinic acid

NADPH or NADH; H+

dihydrodipicolinate reductase

dapB

1.3.1.26

NADP+ or NAD+

tetrahydrodipicolinate

succinyl CoA

tetrahydrodipicolinae N-succinyltransferase

dapD

2.3.1.117

CoA

N-succinyl-epsilon-keto-L-alpha-aminopimelic acid

glutamate

succinyl diaminopimelateaminotransferase

dapC

2.6.1.17

alpha-ketoglutarate

succinyl diaminopimelate

H2O

N-succinyldiaminopimelatedesuccinylase

dapE

3.5.1.18

succinate

LL-diaminopimelic acid

diaminopimelateepimerase

dapF

5.1.1.7

meso-diaminopimelic acid

diaminopimelatedecarboxylase

lysRprotein

lysR

lysA

3.5.1.18

CO2

L-lysine


Example reconstitution of lysine pathway
Example: reconstitution of lysine pathway

  • Gap size: 0

    • all Ecs from original pathway are provided as seeds

  • Seeds

    • 1.2.1.11

    • 1.3.1.26

    • 2.3.1.117

    • 2.6.1.17

    • 2.7.2.4

    • 3.5.1.18

    • 4.1.1.20

    • 4.2.1.52

    • 5.1.1.7

  • Result:

    • Inferring reaction orientation(reverse or forward)

    • Ordering


Example reconstitution of lysine pathway1
Example: reconstitution of lysine pathway

  • Gap size: 1

  • 5 seed reactions

  • Result

    • Inferring missing steps

    • Inferring reaction orientation

    • Ordering


Example reconstitution of lysine pathway2
Example: reconstitution of lysine pathway

  • Gap size: 2

  • 4 seed reactions

  • Result

    • E.coli pathway found

    • Alternative pathways also returned


Example reconstitution of lysine pathway3
Example: reconstitution of lysine pathway

  • Gap size: 3

  • 3 seed reactions

  • Result

    • E.coli pathway is not found, because the program finds shortcuts between the seed reactions


Building pathways from operons
Building pathways from operons

  • Pathways obtained with the pathway builder, using the genes from His operon as seeds


Applications of pathway reconstruction
Applications of pathway reconstruction

  • We have the complete genome for more than 100 bacteria

  • For these genomes,

    • there is almost no experimental characterization of metabolism

    • enzymes have been predicted by sequence similarity.

    • gene expression data will in some cases be available, in most cases not.

  • In some cases, one expects to find the same pathways as in model organisms, but in other cases, variants or completely distinct pathways


Strategy 1 starting from annotated pathways
Strategy 1: starting from annotated pathways

  • For each known pathway from model organisms

    • Select the subset of reactions for which an enzyme exists in the target organism

    • If a reasonable number of reactions are present

      • Using these as seeds, reconstruct a pathway

  • This strategy is likely to detect some variants of the annotated pathways, but is not able to predict novel pathways.


Strategy 2 starting from predicted functional groups
Strategy 2 - starting from predicted functional groups

  • Comparative genomics provides us with clues about functional modularity

    • operons can be predicted following different methods, and reveal some level of modular organisation.

    • groups of synteny can also reveal functional modules.

    • phylogenetic profiles reveal groups of co-evolving genes, which are generally involved in a same process or pathway.

  • Strategy

    • predict operons, groups of synteny, and groups of co-evolving genes

    • with each of these groups

      • select enzyme-coding genes

      • identify the reactions catalyzed by their products

      • use these reactions as seeds for the pathway builder


Graph based analysis of biochemical networks3

Graph-based analysis of biochemical networks

Path finding in weighted graphs

Jacques van [email protected]


Path finding in a weighted graph
Path finding in a weighted graph

  • Assign a higher weight to highly connected compounds. This allows to work with the whole graph, but reduce the probability to use a pool metabolite as intermediate between two successive reactions.

  • Assign a smaller weight to reactions for which an enzyme has been identified in the genome. This will favour organism-specific pathways, without preventing to use spontaneous reactions or reactions catalysed by an unidentified enzyme in this organism.

  • When gene expression data is available, assign a weight to reactions according to the level of expression of the corresponding enzymes. This will favour context-specific pathways.


Test case methionine biosynthesis

L-Aspartate

2.7.2.4

S.cerevisiae

E.coli

L-aspartyl-4-P

1.2.1.11

L-aspartic semialdehyde

1.1.1.3

L-Homoserine

2.3.1.31

2.3.1.46

Alpha-succinyl-L-Homoserine

O-acetyl-homoserine

4.2.99.9

Cystathionine

4.2.99.10

4.4.1.8

Homocysteine

2.1.1.14

L-Methionine

2.5.1.6

S-Adenosyl-L-Methionine

Test case: methionine biosynthesis


Unweighted graphs methionine biosynthesis
Unweighted graphs: methionine biosynthesis

  • Search of the 5 shortest paths from L-aspartate to L-methionine

  • Unweighted graph, all compounds

    • L-aspartic acid --> 6.3.5.4 --> AMP --> 6.1.1.10 --> L-methionine

    • L-aspartic acid --> 3.5.1.15 --> water --> 3.4.13.12 --> L-methionine

    • L-aspartic acid --> 3.5.1.15 --> water --> 3.4.13.12 --> L-methionine

    • L-aspartic acid --> 4.3.1.1 --> NH3 --> 4.4.1.11 --> L-methionine

    • L-aspartic acid --> 3.5.1.15 --> water --> 3.5.1.31 --> L-methionine

  • Unweighted graph, selection of excluded compounds

    • L-aspartic acid --> 2.6.1.35 --> glycine --> 2.6.1.73 --> L-methionine

    • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine --> 2.6.1.73 --> L-methionine

    • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.41 --> d-methionine --> 5.1.1.2 --> L-methionine

    • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.2 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

    • L-aspartic acid --> 4.1.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine --> 2.6.1.73 --> L-methionine


Weighted graph methionine biosynthesis
Weighted graph: methionine biosynthesis

  • Search of the 5 shortest paths from L-aspartate to L-methionine

  • Weighted graph (compound weight = connectivity

    • L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

    • L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

    • L-aspartic acid --> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-diaminobutyrate --> 2.6.1.46 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

    • L-aspartic acid --> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-diaminobutyrate --> 2.6.1.46 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

    • L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.46 --> o-succinyl-L-homoserine --> 2.5.1.48 --> L-cystathionine --> 2.5.1.49 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine





Graph based analysis of biochemical networks4

Graph-based analysis of biochemical networks

From gene expression data to pathways

Jacques van [email protected]


From gene expression data to pathways

gene 1

protein 1

expr

react 1

cat 1

gene 2

protein 2

expr

react 2

cat 2

gene 3

protein 3

expr

cat 3

react 3

gene 4

protein 4

expr

cat 4

gene 5

protein 5

expr

cat 5

react 4

gene 6

protein 6

expr

cat 6

gene 7

protein 7

expr

gene 8

protein 8

expr

gene expressionprofiles

gene 9

protein 9

expr

Pathway reconstruction

Putative pathway

From gene expression data to pathways

Classification

cluster of co-regulated genes


Singular value decomposition

Alpha

cdc15

cdc28

Elu

MCM

CLB2

SIC1

MAT

CLN2

Y'

MET

Spellman et al. (1998). Mol Biol Cell9(12), 3273-97.

Singular value decomposition

Gilbert et al. (2000). Trends Biotech.18(Dec), 487-495.


Pathway found in spellman s met cluster

Sulfate

Sulfate adenylyltransferase

ATP

MET3

2.7.7.4

PPi

Adenylyl sulfate (APS)

Adenylyl sulfatekinase

ATP

MET14

2.7.1.25

ADP

3'-phosphoadenylylsulfate (PAPS)

3'-phosphoadenylylsulfatereductase

NADPH

MET16

1.8.99.4

NADP+; AMP; 3'-phosphate (PAP); H+

sulfite

Putative Sulfite reductase

MET5

3 NADPH; 5H+

1.8.1.2

3 NADP+; 3 H2O

Sulfite reductase

(NADPH)

MET10

sulfide

O-acetyl-homoserine

O-acetylhomoserine (thiol)-lyase

MET17

4.2.99.10

Homocysteine

5-methyltetrahydropteroyltri-L-glutamate

Methionine synthase (vit B12-independent)

MET6

2.1.1.14

5-tetrahydropteroyltri-L-glutamate

L-Methionine

Pathway found in Spellman’s “MET” cluster


Cell cycle regulated genes involved in methionine biosynthesis

Aspartate biosynthesis

L-Aspartate

ATP

Aspartate kinase

HOM3

2.7.2.4

ADP

L-aspartyl-4-P

NADPH

Aspartate semialdehyde deshydrogenase

HOM2

1.2.1.11

NADP+; Pi

L-aspartic semialdehyde

NADPH

Homoserine deshydrogenase

HOM6

1.1.1.3

NADP+

Threonine biosynthesis

MET31

MET32

Met31pmet32p

L-Homoserine

AcetlyCoA

Homoserine O-acetyltransferase

MET2

2.3.1.31

CoA

O-acetyl-homoserine

Sulfur assimilation

O-acetylhomoserine (thiol)-lyase

Sulfide

MET17

4.2.99.10

MET28

Homocysteine

Cbf1p/Met4p/Met28p

complex

CBF1

Cysteine biosynthesis

MET4

5-methyltetrahydropteroyltri-L-glutamate

Methionine synthase (vit B12-independent)

GCN4

Gcn4p

MET6

2.1.1.14

5-tetrahydropteroyltri-L-glutamate

L-Methionine

MET30

Met30p

S-adenosyl-methionine synthetase I

SAM1

H20; ATP

2.5.1.6

S-adenosyl-methionine synthetase II

Pi, PPi

SAM2

S-Adenosyl-L-Methionine

Cell-cycle regulated genes involved in methionine biosynthesis


Cell cycle regulated genes involved in sulfur assimilation

Sulfate (extracellular)

Sulfate transporter

SUL1

Sulfate transport

Sulfate transporter

SUL2

Sulfate (intracellular)

ATP

Sulfate adenylyltransferase

MET3

2.7.7.4

PPi

MET31MET32

Met31p Met32p

Adenylyl sulfate (APS)

ATP

Adenylyl sulfatekinase

MET14

2.7.1.25

ADP

3'-phosphoadenylylsulfate (PAPS)

NADPH

3'-phosphoadenylylsulfatereductase

MET16

1.8.99.4

MET28

NADP+; AMP; H+;

3'-phosphate (PAP)

CBF1

Cbf1p/Met4p/Met28p

complex

sulfite

MET4

Putative Sulfite reductase

MET5

3 NADPH; 5H+

1.8.1.2

GCN4

Gcn4p

3 NADP+; 3 H2O

Sulfite reductase

(NADPH)

MET10

sulfide

Methionine biosynthesis

MET30

Met31p

Cell-cycle regulated genes involved in Sulfur assimilation


Analysis of data from gasch et al
Analysis of data from Gasch et al.

  • Gasch et al (2000). Molecular Biology of the Cell, 11:4241-4257

  • 6,152 yeast genes

  • 142 DNA chips

  • Various conditions

    • Stress (heat shock, osmotic shock, peroxide, amino acid starvation, nitrogen depletion

    • Steady-state growth on alternative carbon sources

    • Overexpression studies


Selected experiments

MSN2 overexpression

MSN4 overexpression

YAP1 overexpression

ethanol

galactose

glucose

mannose

raffinose

sucrose

ethanol vs reference

fructose vs reference

galactose vs reference

glucose vs reference

mannose vs reference

raffinose vs reference

sucrose vs reference

Selected experiments




Matching clusters against annotated pathways
Matching clusters against annotated pathways

Strengths

  • Simple

  • Based on experimentally characterized pathways

    Weaknesses

  • Restricted to the current knowledge: no chance to discover alternative or new pathways

  • A cluster of co-expressed enzymes generally overlaps multiple pathway: how to interpret such "transversal" clusters


Repressed by mannose at least 3 fold
Repressed by mannose (at least 3-fold)

Galactose utilization

Citrate cycle with shunt

(redundancy in the database ?)

inferred

gluconeogenesis

Remark: arrows should be displayed as bi-directional


Repressed by mannose at least 2 fold
Repressed by mannose (at least 2-fold)

(redundancy in the database ?)

Citrate cycle with shunt

Galactose utilization

gluconeogenesis

gluconeogenesis

Remark: arrows should be displayed as bi-directional


Induced by galactose at least 2 fold
Induced by galactose (at least 2-fold)

Galactose utilization

Remark: arrows should be displayed as bi-directional


Repressed by glucose at least 2 fold
Repressed by glucose (at least 2-fold)

(redundancy in the database ?)

Galactose utilization

gluconeogenesis

gluconeogenesis


Alternative criteria for gene clustering
Alternative criteria for gene clustering

  • Functional clusters can be obtained by other methods as well :

    • 2-hybrids

      • clusters of physically interacting proteins

    • Proteomics

      • changes in protein concentration

      • changes in protein states (phosphorylation)

    • Gene fusion analysis

    • "Magpie signature", phylogenetic profiles

    • Gene knock-out analysis

      • genes with similar mutant phenotypes

    • ...



Summary
Summary

  • Starting from an unordered cluster of genes, one gets a pathway

  • This method does not rely on pre-annotated pathways: only requires a description of all known reactions and their substrates/products

  • Should permit discovery of novel pathways, that are not stored in any pathway database yet

  • Interpretation of intercalated reactions

    • enzyme is not regulated

    • DNA chip defect for that gene

    • gene was not on the DNA chip

    • enzyme remains to be identified in that organism


Applications
Applications

  • Discovery of alternative pathways

    • known pathways have been established on a restricted number of model organisms, alternative pathways remain to discover

  • Drug discovery

    • Liver extracts in presence/absence of a drug  which metabolic pathways are affected ?

  • Xenobiotics degrading bacteria in presence/absence of the xenobiotic compound

    • through which succession of reactions is the compound degraded ?


Pathway building
Pathway building

Strengths

  • Not restricted to previously characterized pathways

  • Not constrained by pre-defined pathway boundaries

    Weaknesses

  • Inference intrinsically means false positive

    • LIGAND contains many reactions which are not catalyzed in the considered organism

    • when the number of intercalated reactions is too large, some links have no biological relevance

  • Sensitivity to the choice of excluded compounds

    Current research

  • Refining the pathway representation with weighted graphs

    • Pool metabolites are not excluded anymore, but they cost more

    • Enzymes can be weighted according to

      • the evidence for their presence in the selected organism

      • their level of expression in the considered experiment



Recurrent modules

D

Recurrent modules


Recurrent modules1

C

Recurrent modules


Recurrent modules2

H

Recurrent modules


Recurrent modules3

A

B

F

Recurrent modules


Module regulation
Module regulation

glucose

E

fructose

G

ethanol

sucrose

C

mannose

raffinose

H

galactose

D



Module regulation and function
Module regulation and function

Carbon source

Module

Pathway (from KEGG)

glucose

E

Pyrimidine metabolism

TCA cycle

fructose

G

Glyoxylate and dicarboxylate metabolism

Oxidative phosphorylation

ethanol

F

Reductive carboxylate cycle (CO2 fixation)

sucrose

Nucleotide sugars metabolism

mannose

C

Galactose metabolism

galactose

D

Glycolysis/gluconeogenesis

raffinose

H

Fructose and mannose metabolism

A

Arginine and proline metabolism

B

Lysine degradation


E thanol activation

Fatty acid

metabolism

Phospholipid

degradation

Glyoxylate and TCA cycle

Ethanol activation

Figure 4A: ethanol activation, linking distance = 1 step (no intercalated reactions), threshold 2 fold change.


Ethanol activation

Glyoxylate and TCA cycle

Glycerolipid metabolism

Starch and sucrose metabolism

Phospholipid degradation

Ethanol activation

Figure 4B: ethanol activation, linking distance = 2 steps (1 intercalated reactions), threshold 2 fold change.


Glucose repression linking distance 1

Starch and

sucrose metabolism

Galactose

metabolism

Aminosugars

metabolism

Glycolysis /

Gluconeogenesis

Oxidative phosphorylation

Fructose and

mannose metabolism

Arginine and

proline metabolism

TCA- and

Glyoxylate- cycles

Glucose repression - linking distance = 1

Figure 5A: glucose repression, linking distance = 1 step (no intercalated reactions), threshold 2 fold change.


Glucose repression linking dist 2

Fructose and mannose metabolism

Nucleotide sugars metabolism

Glycolysis /

Gluconeogenesis

Aminosugars metabolism

Glucose repression - linking dist = 2

Figure 5B: glucose repression, linking distance = 2 steps (1 intercalated reactions), threshold 3 fold change.


Clustering reactions
Clustering reactions

Module 3

Module 4

Module 10

Figure 6: clustering (ward method) of reactions from modules of connected reactions

(linking distance = 2 steps or 1 intercalated reaction) : clusters were obtained

for a pruning with k=10.


Recurring modules
Recurring modules

TCA- and Glyoxylate- cycles

+

Ethanol

Galactose

+

Mannose

Raffinose

Sucrose

+

Nucleotide sugars and

Galactose metabolisms

Arginine and proline metabolism

Figure 7: recurring modules of connected reactions are activated (red arrows)

or repressed (green arrows) by different carbon sources. Only reactions represented

in blue into boxes are regulated.


Applications1
Applications

  • Discovery of alternative pathways

    • known pathways have been established on a restricted number of model organisms, alternative pathways remain to discover

  • Drug discovery

    • Liver extracts in presence/absence of a drug  which metabolic pathways are affected ?

  • Xenobiotics degrading bacteria in presence/absence of the xenobiotic compound

    • through which succession of reactions is the compound degraded ?


ad