Biological Network Analysis:Introduction to Metabolic Networks Tomer Shlomi Winter 2008
Lecture Outline 1. Cellular metabolism 2. Metabolic network models 3. Constraint-based modeling 4. Optimization methods
Metabolism is the totality of all the chemical reactions that operate in a living organism. Metabolism (I) Catabolic reactions Breakdown and produce energy Anabolic reactions Use energy and build up essential cell components
Metabolism (II) “Metabolism is the process involved in the maintenance of life. It is comprised of a vast repertoire of enzymatic reactions and transport processes used to convert thousands of organic compounds into the various molecules necessary to support cellular life” Kenneth et al. 2003
Why study metabolism? (I) • Basic science - it’s the essence of life.. • Tremendous importance in Medicine • In born errors of metabolism cause acute symptoms and even death on early age • Metabolic diseases (obesity, diabetics) are major sources of morbidity and mortality. c. Metabolic enzymes and their regulators gradually becoming viable drug targets
Why study metabolism? (II) 3. Bioengineering applications • Design strains for production of biological products of interest • Generation of bio- fuels 4. Probably the best understood of all cellular networks: metabolic, PPI, regulatory, signaling
Metabolites and Biochemical Reactions • Metabolite - an organic substance: • Sugars – glucose, galactose, lactose, etc’ • Carbonhydrates – glycogen, glucan, etc’ • Amino-acids – histidine, proline, methionine, etc’ • Nucleotides – cytosine, guanine, etc’ • Lipids • Chemical energy carriers – ATP, NADH, etc’ • Atoms – oxygen, hydrogen • Biochemical reaction: the process in which one or more substrate molecules are converted (usually with the help of an enzyme) to product molecules Glucose + ATP Glucokinase Glucose-6-Phosphate + ADP
Metabolic Networks • A set of reactions and the corresponding metabolites • A directed hyper-graph representation • Nodes - represent metabolites • Edges - represent biochemical reactions
Metabolites (I) The 744 reactions of E.coli small-molecule metabolism involve a total of 791 different substrates. On average, each reaction contains 4.0 substrates. Number of reactions containing varying numbers of substrates (reactants plus products).
Metabolites (II) Each distinct substrate occurs in an average of 2.1 reactions. Bioinformatics III
Reactions Catalyzed by More Than one Enzyme Diagram showing the number of reactions that are catalyzed by one or more enzymes. Most reactions are catalyzed by one enzyme, some by two, and very few by more than two enzymes. For 84 reactions, the corresponding enzyme is not yet encoded in EcoCyc. What may be the reasons for isozyme redundancy? (1) the enzymes that catalyze the same reaction are homologs and have duplicated (or were obtained by horizontal gene transfer), acquiring some specificity but retaining the same mechanism (divergence) (2) the reaction is easily „invented“; therefore, there is more than one protein family that is independently able to perform the catalysis (convergence).
Enzymes that catalyze more than one reaction Genome predictions usually assign a single enzymatic function. However, E.coli is known to contain many multifunctional enzymes. Of the 607 E.coli enzymes, 100 are multifunctional, either having the same active site and different substrate specificities or different active sites. Number of enzymes that catalyze one or more reactions. Most enzymes catalyze one reaction; some are multifunctional. The enzymes that catalyze 7 and 9 reactions are purine nucleoside phosphorylase and nucleoside diphosphate kinase.
Pathways (I) EcoCyc describes 131 pathways: energy metabolism nucleotide and amino acid biosynthesis secondary metabolism Pathways vary in length from a single reaction step to 16 steps with an average of 5.4 steps. Length distribution of EcoCyc pathways Ouzonis, Karp, Genome Res. 10, 568 (2000) Bioinformatics III
Pathways (II) However, there is no precise biological definition of a pathway. The partitioning of the metabolic network into pathways (including the well-known examples of biochemical pathways) is somehow arbitrary. These decisions of course also affect the distribution of pathway lengths.
Reactions participating in more than one pathway The 99 reactions belonging to multiple pathways appear to be the intersection points in the complex network of chemical processes in the cell. E.g. the reaction present in 6 pathways corresponds to the reaction catalyzed by malate dehydrogenase, a central enzyme in cellular metabolism. The 99 reactions belonging to multiple pathways appear to be the intersection points in the complex network of chemical processes in the cell. E.g. the reaction present in 6 pathways corresponds to the reaction catalyzed by malate dehydrogenase, a central enzyme in cellular metabolism.
Metabolic Network Models • The application of computational methods to predict the network behavior usually requires additional data other than the network topology • A ‘GS metabolic network model’ is a collection of such data: • Reaction stoichiometry • Reaction directionality • Cellular localization • Transport and exchange reactions • Gene-protein-reaction association
Metabolic Network Model: Reaction Stoichiometry • Stoichiometry - the quantitative relationships of the reactants and products in reactions 1 Glucose + 1 ATP <-> 1 Glucose-6-Phosphate + 1 ADP
Metabolic Network Model: Reaction Directionality • Biochemical studies may test the reversibility of enzymatic reactions • But the directionality can differ between in vitro and in vivo due to different temperature, pH, ionic strength, and metabolite concentrations. • A subset of the reactions in a model is uni-directional and the remaining reactions are bi-directional 1 Glucose + 1 ATP -> 1 Glucose-6-Phosphate + 1 ADP
Metabolic Network Model: Cellular Localization (II) • Algorithms: PSORT and SubLoc to predict the cellular localization of proteins based on nucleotide or amino acid sequences • High-throughput experimental approaches such as immunofluorescence and GFP tagging of individual proteins. Cytoplasm: 1 Glucose + 1 ATP -> 1 Glucose-6-Phosphate + 1 ADP
Metabolic Network Model: Transport and Exchange Reactions • An extra-cellular compartment is also included in the model • Transport reaction move metabolites between compartments (across membrane boundaries) • Glucose[c] <-> Glucose[e] • Exchange reaction move metabolites across the model boundary • Glucose[e] <-> • Uptake = in • Secretion = out
Gene-Protein-Reaction (GPR) Association (I) • Formulated via Boolean logic • Sdh protein made up of 4 peptides, catalyzes 2 reactions
Gene-Protein-Reaction (GPR) Association (II) • A protein complex made up of 3 proteins catalyzes a single reaction
Gene-Protein-Reaction (GPR) Association (III) • Isozymes – alternative enzymes that catalyze the same reaction
Metabolic Network Models • A ‘GS metabolic network model’ is a collection of: • A metabolic network • Reaction stoichiometry • Reaction directionality • Cellular localization • Transport and exchange reactions • Gene-protein-reaction association
Model Reconstruction Process (II) • Performed mainly in Bernhard Palsson’s lab in UCSD. • Model naming convention:
Stoichiometric Matrix (I) • Stoichiometric matrix – network topology with stoichiometry of biochemical reactions (denoted S) • A Metabolite that exists in multiple compartments is represented with multiple rows in the matrix • How would transport and exchange reactions represented?
Kinetic Modeling: Definition • Predict changes in metabolite concentrations • m – metabolite concentrations vector - mol/mg • S – stoichiometric matrix • v – reaction rates vector - mol/(mg*h) A set of Ordinary Differential Equations (ODE) Reaction rate equation Kinetic parameters • Requires knowledge of m, f and k!
Kinetic Modeling: Reaction Rate Equations (I) • Consider the reaction: S->P • A simple rate equation (Michaelis-Menten) is: • In this case, we have only 2 kinetic parameters – vmax and Km
Kinetic Modeling: Reaction Rate Equations (II) • Consider the reaction: S + E <-> P + E • A more complex Michaelis-Menten equation: • In this case, we have only 4 kinetic parameters – vmax+,vmax-, KmS, and KmP,
Kinetic Modeling: Reaction Rate Equations (III) • Reaction rate equations also depends (via k) on: • Regulation: effectors, inhibitors • Enzyme concentration • Surrounding reactions and molecules • pH, ion-balance, molecule-gradients, energy potentials • Kinetics are problematic • Obtained from test tube tests of purified enzymes • Measurement doesn’t apply on cell environment • Most of these parameters are unknown!
Constraint-based modeling (CBM) (I) • Assumes a quasi steady-state • No changes in metabolite concentrations (within the system) • Metabolite production and consumption rates are equal • Representing the ‘average’ flow in the network over a long enough period of time • The reaction rate vector v is referred to as a ‘steady-state flux distribution’ • No need for information on metabolite concentrations, reaction rate equations, and kinetic parameters
CBM (II) • In most cases, S is underdetermined, and there exist a space of possible flux distributions v that satisfy: • The idea in CBM is to employ a set of constraints to limit the space of possible solutions to those more likely/correct • Mass balance is enforced by the above equation • Thermodynamic: irreversibility of reactions • Enzymatic capacity: bounds on enzyme rates • Availability of nutrients Solution space Correct solutions
CBM (III) • The solution space decreases with the addition of more constraints Mass balance S·v = 0 Subspace of R Thermodynamic vi > 0 Convex cone Capacity vi < vmax Bounded convex cone n
Determination of Likely Flux Distributions • In most cases lack of constraints provide a space of solutions • How to identify plausible solutions within this space? • Optimization methods (next lesson) • Maximal biomass production rate • Minimal ATP production rate • Minimal nutrient uptake rate • Exploring the solution space (the following lesson) • Extreme pathways • Elementary modes
Flux Balance Analysis (I) • An optimization method for finding a feasible flux distribution that enables maximal growth rate of the organism • Based on the assumption that evolution optimizes microbes growth rate • To enable maximal growth rate the essential biomass precursors (metabolites) should be synthesized in the maximal rate • Add to the model a pseudo ‘growth reaction’ representing the metabolites required for producing 1g of the organism’s biomass • These precursors are removed from the metabolic network in the corresponding ratios: 41.1 ATP + 18.2 NADH + 0.2 G6P… -> biomass