1 / 37

V8: Structure of Cellular Networks

V8: Structure of Cellular Networks. Today : Dynamic Simulation of Protein Complex Formation on a Genomic Scale Most cellular functions are conducted or regulated by protein complexes of varying size organization into complexes may contribute substantially to an organism‘s complexity.

Download Presentation

V8: Structure of Cellular Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. V8: Structure of Cellular Networks • Today: Dynamic Simulation of Protein Complex Formation on a Genomic Scale • Most cellular functions are conducted or regulated by protein complexes of varying size • organization into complexes may contribute substantially to an organism‘s complexity. • E.g. 6000 different proteins (yeast) may form 18  106 different pairs of interacting proteins, but already 1011 different complexes of size 3. •  mechanism how evolution could significantly increase the regulatory and metabolic complexity of organisms without substantially increasing the genome size. • - Only a very small subset of the many possible complexes is actually realized. Beyer, Wilhelm, Bioinformatics Cell Simulations

  2. Review: Yeast protein interaction network:first example of a scale-free network A map of protein–protein interactions in Saccharomyces cerevisiae, which is based on early yeast two-hybrid measurements, illustrates that a few highly connected nodes (which are also known as hubs) hold the network together. The largest cluster, which contains 78% of all proteins, is shown. The colour of a node indicates the phenotypic effect of removing the corresponding protein (red = lethal, green = non-lethal, orange = slow growth, yellow = unknown). Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004) Cell Simulations

  3. Review: Systematic identication of large protein complexes by tandem affinity purification Yeast 2-Hybrid-method can only identify binary complexes. Cellzome company: attach additional protein P to particular protein Pi , P binds to matrix of purification column.  yieldsPi and proteins Pk bound to Pi . Identify proteins by mass spectro- metry (MALDI- TOF). Gavin et al. Nature 415, 141 (2002) Cell Simulations

  4. BDIM models: birth, death and innovation Manually curated set of 229 biologically meaningful ‚TAP complexes‘ from yeast with sizes ranging from 2 to 88 different proteins per complex. „Cumulative“ means that there are 229 complexes of size 2 that may also be parts of larger complexes. Cell Simulations

  5. Frequency of complexes in experiments Large-scale experiments  size-frequency of complexes has common characteristics: # of complexes of a given size versus complex size is exponentially decreasing Does the shape of this distribution reflect the nature of the underlying cellular dynamics which is creating the protein complexes?  Test by simulation model Cell Simulations

  6. Protein Abundance Data Abundance of 6200 yeast proteins: .... Beyer et al. (2004) compiled a protein abundance data set for yeast under standard conditions in YPD-medium. Based on this data set we derived a distribution of protein abundances that resembles the characteristics of the measured data in the upper range (Figure S2). For approximately 2000 proteins no abundance values are available. We assume that the undetected proteins primarily belong to the low-abundance classes, which gives rise to the hypothetical distribution. Cell Simulations

  7. Dynamic Complex Formation Model 3 variants of the protein complex association-dissociation model (PAD-model) are tested with the following features: (i) In all 3 versions the composition of the proteome does not change with time. Degradation of proteins is always balanced by an equal production of the same kind of proteins. (ii) The cell consists of either one (PAD A & B) or several (PAD C) compartments in which proteins and protein complexes can freely interact with each other. Thus, all proteins can potentially bind to all other proteins in their compartment. (iii) Association and dissociation rate constants are the same for all proteins. In PAD-models A and C association and dissociation are independent of complex size and complex structure. Cell Simulations

  8. Dynamic Complex Formation Model (iv) At each time step a set of complexes is randomly selected to undergo association and dissociation. Association is simulated as the creation of new complexes by the binding of two smaller complexes and dissociation is simulated as the reverse process, i.e. it is the decay of a complex into two smaller complexes. The number of associations and dissociations per time step is ka· NC2and kd · NCrespectively, NC: total number of complexes in the cell ka[1/(#complexes · time)] : association rate constant kd[1/time] : dissociation rate constant. kaand kdare mathematically equivalent to biochemical rates of a reversible reaction. Cell Simulations

  9. Protein Association/Dissociation Models PAD A: the most simple model where all proteins can interact with each other (no partitioning) and it assumes that association and dissociation are independent of complex size. PAD B : is equivalent to PAD A, except for the assumption that larger complexes are more likely to bind (preferential attachment). In this case we assume that the binding probability is proportional to i·j, where i and j are the sizes of two potentially interacting complexes. PAD C : extends PAD A by assuming that proteins can interact only withingroups of proteins (with partitioning). The sizes of these protein groups are based on the sizes of first level functional modules according to the yeast data base. PAD C assumes 16 modules each containing between 100 and 1000 different ORFs. Hence, the protein groups do not represent physical compartments, but rather resemble functional modules of interacting proteins. Cell Simulations

  10. Mathematical Description Since explicit simulation of an entire cell (50 million protein molecules were simulated) is too time consuming for many applications of the model, we also developed a mathematical description of the PAD model, which allows us to more quickly assess different scenarios and parameter combinations. The change of the number of complexes of size i, xi, during one time step t can be described as (1) Gia and Gid : gains due to association and dissociation L i a and Lid : losses due to association and dissociation Cell Simulations

  11. Mathematical Description Given a total number of NCcomplexes, the total number of associations and dissociations per time step are ka · NC2and k d · NC, respectively. We assume throughout that we can calculate the mean number of associating or dissociating complexes of size i per time step as 2 · ka · xi· NCand kd · xi. The probability that complexes of size j and i-j get selected for one association is  deduce the number of complexes of size i that get created during each time step via association of smaller complexes simply by summing over all complex sizes that potentially create a complex of size i: Cell Simulations

  12. Mathematical Description When j is equal to i/2 (which is possible only for even i’s) both interaction partners have the same size. The size of the pool xi-j is therefore reduced by 1 after the first interaction partner has been selected, which yields a small reduction of the probability of selecting a second complex from that pool. We account for this effect with the correction i, which only applies to even i’s: This correction is usually very small. The loss of complexes of size i due to association is simply proportional to the probability of selecting them for association, i.e. Cell Simulations

  13. Mathematical Description Complexes of size i get created by dissociation of larger complexes. A complex of size j has possible ways of dissociation and the number of possible fragments of size i is The probability that a dissociating complex of size j > i creates a fragment of size i is hence The number of new complexes follows by summing over all possible parent sizes The respective loss term becomes Cell Simulations

  14. Mathematical Description Figure S1 shows a comparison of a numerical solution of equation (1) with a stochastic simulation of the association-dissociation process. Cell Simulations

  15. Mathematical Description After a transient period a steady-state is reached. We are mainly interested in this steady-state distribution of frequencies xi.  find a set of xisolving xi/t = 0. The solution of this non-linear equation system is obtained by numerically minimizing all xi /t. By dividing equation (1) by kdit can be seen that the steady-state distribution is independent of the absolute values of kaand kd, but it only depends on the ratio of the two parameters Rad= ka/ kd. Hence, only two parameters affect the xiat steady-state: - the total number of proteins NP(which indirectly determines NC) and - the ratio of the two rate constants Rad. Cell Simulations

  16. Mathematical Description For PAD-model B the dissociation terms remain unchanged, wheras the association terms have to be modified. In case of PAD C we calculated weighted averages of results obtained with PAD A. Assume that association is proportional to the product of the sizes of the participating complexes. This assumption changes equation (2) to: where n is the maximum complex size and Cell Simulations

  17. Measurable Size Distribution and Bait Selection Based on the distribution resulting from equation (1) at steady-state we derive two further distributions: (i) the ‘measurable size distribution’ and (ii) the ‘bait distribution’. The former is defined as the frequency distribution of the measurable complex sizes. The measurable complex size is the number of different proteins in a protein complex (as opposed to the total number of proteins). For the measurable size-distribution we only count the number of complexes with distinct protein compositions. Measurable versus ‘actual’ complex size distribution. Diamonds show frequencies of actual complex sizes and triangles are frequencies of measurable complexes. Filled diamonds and triangles reflect simulation without partitioning (PAD A) and open diamonds and triangles are simulation results assuming binding only within certain modules (PAD C). The difference between the original and the measurable complex size distribution is comparably small, because most of the simulated complexes are unique. However, in case of PAD C smaller complexes occur at higher copy numbers and larger complexes are often counted as smaller measurable complexes because they contain some proteins more than once. Cell Simulations

  18. Computation of a Dissociation Constant KD Mathematically our model describes a reversible (bio-)chemical reaction.  calculate an equilibrium dissociation constant KD, which quantifies the fraction of free subcomplexes A and B compared to the bound complex AB. This equilibrium is complex size dependent, because a large complex AB is less likely to randomly dissociate exactly into the two specific subunits A and B than a small complex. (A and B can be ensembles of several proteins.) We get for any given complex of size i the following KD: KD(i) = [A][B] / [AB] = (Rad·Ni· V) – 1 (4) where Niis the number of possible fragments of a complex of size i and V is the cell volume. Cell-wide averages of KD-values are estimated by computing a weighted average with NCbeing the total number of complexes and xibeing the number of complexes of size i. Cell Simulations

  19. Biochemical Interpretation of the Rate Constants The process of forming a protein complex AB from the two subcomplexes A and B, and its dissociation can be described as a reversible reaction: with constants kon [L/(mol s)] and koff[1/s] quantifying the forward and backward reactions: In our model the concentration [A] can be calculated as with fAbeing the fraction of species A among all NC complexes in the system and V being the cell volume. Cell Simulations

  20. Biochemical Interpretation of the Rate Constants The number of associations of two complex-species A and B per time step becomes since we assume ka·NC2many associations per time step. Here, nAand nBare the number of complexes of the respective species. Division by the cell-volume V yields units of ‘concentration per time’. Thus, konin a biochemical reaction approximately equals ka ·V, since the total number of complexes NCis very large in all scenarios that we have simulated. Cell Simulations

  21. Biochemical Interpretation of the Rate Constants When looking for an equivalent expression for koffwe have to quantify the specific dissociation of a complex AB into the subcomplexes A and B. The unspecific dissociation of AB is simply kd·[AB], kd : dissociation rate constant. Since AB may consist of > 2 proteins it can also be split into subcomplexes other than A and B. For the specific dissociation rate, one has to know how often AB actually dissociates into the subcomplexes A and B. The total number of dissociations per time step is kd · NC. The probability that a complex AB with size i breaks into the specific sub-complexes A and B is 1/Ni, Ni : number of possible fragments of a complex of size i. This holds under the assumption that all proteins in AB are distinct, which is approximately true for the simulations conducted here. Cell Simulations

  22. Biochemical Interpretation of the Rate Constants nAB/NC: fraction of complexes AB among all complexes  size specific dissociation rate N ABdissoc (i): from which the complex size dependent rate constant koff.(i) = kd/Niresults. Taking into account that certain proteins may be in the complex more than once we get koff= kd/Ni. One can calculate an apparent equilibrium constant KD, which describes the equilibrium between the independent species A and B and the bound species AB: where i is the size of the complex AB. Since Niis exponentially increasing with i, KDis exponentially decreasing with complex size. Cell Simulations

  23. Results We dynamically simulated the association and dissociation of 6200 different protein types yielding a set of about 50 million protein molecules. Subsequently we analyzed the resulting steady-state size distribution of protein complexes. This steady-state is thought to reflect the log-growth conditions under which the yeast cells were held when TAP-measuring the protein complexes (Gavin et al., 2002). Based on measured protein complex data (Gavin et al., 2002) we calculated a protein complex size distribution to which we can compare the simulation results (Figure 1). Cell Simulations

  24. Results The TAP measurements do not provide concentrations of the measured complexes, but they only demonstrate the presence of a certain protein complex in yeast cells. In addition, also the number of proteins of a certain type inside such a complex could not be measured. Hence, the complex size from Figure 1 does not represent real complex sizes (i.e. total number of proteins in the complex), but it refers to the number of different proteins in a complex. The measured data reflect the characteristics of only 229 different protein complexes of size two and larger, which is just a small subset of the ‘complexosome’. These peculiarities have to be taken into account when comparing simulation results to the observed complex size distribution. We refer to the ‘measurable complex size’ as the number of distinct proteins in a protein complex (Figure 2). When comparing our simulation results to the measurements, we always select a random-subset of 229 different complexes from the simulated pool of complexes. This results in a complex size distribution comparable to the measured distribution from Figure 1 (‘bait distribution’). Cell Simulations

  25. Effect of preferential attachment Cumulative number of distinct protein complexes versus their size, resulting from simulations without (diamonds) and with (squares) preferential attachment to larger complexes. Both simulations are performed with the best fit parameters for PAD A. In case of preferential attachment the best regression result (solid line) is obtained with a power-law, while the simulation without preferential attachment is best fitted assuming an exponentially decreasing curve. The original, measurable and bait distributions are always close to exponential in case of PAD A and power-law like in case of PAD B, independent of the parameters chosen. PAD B model gives power-law distribution  not in agreement with experimental observation. Cell Simulations

  26. Conclusions - very simple, dynamic model can reproduce the observed complex size distribution. Given the small number of input parameters the very good fit of the observed data is astonishing. Conclusion 1 preferential attachment does not take place in yeast cells under the investigated conditions. This is biologically plausible: Specific and strong binding can be just as important for small protein complexes as for large complexes.  the dissociation should on average be independent of the complex size. The interpretation of the simulated association and dissociation in terms of KD-values suggests that larger complexes bind more strongly than smaller complexes. However, the size dependence of KDis compensated by the higher number of possible dissociations in larger complexes. We always assume that all possible dissociations happen with the same probability. In reality large complexes may break into specific subcomplexes, which subsequently can be re-used for a different purpose.  Improved versions of the model should account for specificity of association and for specific dissociation. Cell Simulations

  27. Conclusions Conclusion 2 the number of complexes that were missed during the TAP measurements is potentially large. Simulations give an upper limit of the number of different complexes in cells. At a first glance, the number of different complexes in PAD A (> 3.5 mill.) and PAD C (~ 2 mill.) may appear to be far too large. Even PAD C may overestimate the true number of different complexes, because association within the groups is unrestricted. However, the PAD-models do not only simulate functional, mature complexes, but they also consider all intermediate steps. Each of these steps is counted as a different protein complex. The large difference between the number of measured complexes and the (potential) number of existing complexes may partly explain the very small overlap that has been observed between different large scale measurements of protein complexes. A correct interpretation of the kinetic parameters is important. First of all, kaand kdcannot be compared to real numbers, because the model does not define a length of the time steps for interpreting kaand kdas actual rate constants. In addition, the association-to-dissociation ratio Rad is not identical to a physical KD-value obtained by in vitro measurements of protein binding in water solutions. Cell Simulations

  28. Discussion Several reasons do not allow for this simple interpretation: (i) In vivo diffusion rates are below those in water due to the high concentration of proteins and other large molecules in the cytosol. (ii) Most proteins either are synthesized where they are needed or they get transported directly to the site where the complex gets compiled. Hence, transport to the site of action is on average faster than random diffusion. (iii) Protein concentrations are often above the cell average due to the compartmentalization of the cell. All these processes (protein production, transport, and degradation) are not explicitly described in the PAD-model, but they are lumped in our assumptions. The Rad(and the KDderived from it) must therefore be interpreted as an operationally defined property. It characterizes the overall, cell averaged complex assembly process, which includes all steps necessary to synthesize a protein complex. Cell Simulations

  29. Discussion However, even the model-derived KD-s allow for some conclusions regarding complex formation. We calculated weighted averages (KD) of the size-dependent KD-values by using the steady-state complex size distribution of the best fit. This yields average KD -s of 4.7 nM and 0.18 nM for the best fits of PAD A and PAD C, respectively. First, the fact that the KDfor PAD C is below that of PAD A underlines the notion, that more specific binding is reflected by smaller KDvalues. Second, typical in vitro KD–values are above 1 nM, thus the average KDof PAD C is comparably low. The model therefore confirms that protein complex formation in vivo gets accelerated due to directed protein transport and due to the compartmentalization of eukaryotes. It is a surprising finding though, that important aspects of these highly regulated protein synthesis and transport processes can on average be described by a simple compartment model assuming random association and dissociation. Large scale protein-protein interaction data sets are subject to substantial error, resulting in a potentially large number of false positives and false negatives. Cell Simulations

  30. Possible Limitations In order to get a correct picture of the protein complex size distribution it is necessary to have an unbiased, random subset of all complexes in the cells. TAP data are biased, e.g. contain too few membrane proteins. However, if compared to other data sets such as MIPS complexes, the TAP complexes constitute a fairly random selection of all protein complexes in yeast. Uncertainties in the TAP data do not affect our conclusions as long as they are not strongly biased with respect to the resulting complex size distribution. Since Gavin et al. (2002) have measured long-term interactions, our results apply to permanent complexes. Yet the model is applicable to future protein complex data that take account of transient binding. Cell Simulations

  31. Discussion The simulated complex size distribution is almost independent of the assumed protein abundance distribution. PPis a valuable summarizing property that can be used to characterize proteomes of different species. A decreasing PPincreases the number of different large complexes (the slope in Table 1 gets more shallow), because it is less likely that a large complex contains the same protein twice. Thus, PPis a measure of complexity that not only relates to the diversity of the proteome but also to the composition of protein complexes. Probably the most severe simplification in our model is the assumption that all proteins can potentially interact with each other. The PAD-model C is a first step towards more biological realism. By restricting the number of potential interaction partners it more closely maps functional modules and cell compartments, which both restrict the interaction among proteins. Cell Simulations

  32. Further improvements The partitioning in PAD C connotes that proteins within one group exhibit very strong binding, whereas binding between protein groups is set to zero. This again is a simplification, since cross-talk between different modules or compartments is possible. Future extensions of the model could incorporate more and more detailed information about the binding specificity of proteins. Assuming even more specific binding will further reduce the number of different complexes, whereas the frequency of the complexes will increase. High binding specificity potentially lowers the complex sizes, so Radhas to be increased in order to fit the experimentally observed protein complex size distribution. On the other hand, cross talk gives rise to larger complexes. Taking both counteracting refinements into account, it is impossible to generally predict the best-fit Rad, since it depends on the quantitative details. Cell Simulations

  33. Further improvements A first additional refinement of PAD C could account for the observed clustering of protein interaction networks. In a second step one could simulate protein associations and dissociations according to predefined binary protein interactions. A most detailed model could additionally account for individual association/dissociation rates between individual proteins. Such extensions will yield more realistic figures about the number of different protein complexes created in yeast cells. However, starting model development with the most simple assumptions reveals the most important characteristics of the system for reproducing the observations. The very good match that we already obtain with the most simple model PAD A is striking. Cell Simulations

  34. Cell Simulations

  35. Measurable Size Distribution and Bait Selection In order to determine the distribution of measurable complex sizes corresponding to the steady-state distribution, we create a set of complexes according to the original steady-state size distribution by randomly ‘filling’ the complexes with proteins from the protein abundance distribution. We then compute the resulting measurable size distribution. Results shown are averages of several of such random sets. The bait distribution is used to compare the simulation to the actual TAP measurements. The bait distribution is obtained by randomly selecting and subsequently analyzing a subset from all simulated complexes. We call that distribution ‘bait distribution’, because the process of selecting a subset of all complexes corresponds to selecting bait proteins for pulling out the complexes during the measurements. We always select 229 different complexes, which is the number of TAP complexes to which we compare the simulations. Cell Simulations

  36. Only the Probability of Selecting the Same Protein Twice Matters In a real protein abundance distribution each gene can potentially have a different expression level. However, for our model it is sufficient to use averaged properties of the protein abundance distribution. Each protein complex can be viewed as a set of proteins that are independently drawn from the pool of available proteins, because we assume that association is independent of the particular proteins. The measurable complex size is the actual complex size reduced by the number of proteins that are drawn twice or more often. The measurable complex size is on average only affected by the probability PPof pulling the same protein twice out of the pool of available proteins. This probability can be calculated by using the fact that (ni– 1)/ (NP– 1) is the probability of selecting a second protein of the specific type i: where NP is the total number of proteins in cell, k is the number of protein types (or genes), and ni is the number of molecules of the respective protein type. Cell Simulations

  37. Computation of a Dissociation Constant KD Theoretically one would have to separately account for proteins that are present 3 times or more often in one complex. However, the likelihood that a protein gets selected for a third time is orders of magnitude below the probability to select a protein twice. Therefore, the few cases when a protein gets selected more often then twice are negligible. We have verified this statement by respective simulations. Additionally, the measurable complex size distribution counts only distinct complexes, i.e. we have to reduce the complex frequency by those complexes that appear more than once. In model PAD A, which assumes one large compartment, it is very unlikely that a complex of size 3 appears more than once due to the heterogeneity of the protein pool. Hence, in this case mainly the frequency of complexes with size 2 (x2) is affected by counting only distinct complexes. This x2in turn depends on PP. Since model PAD C assumes more homogenous interacting groups, the likelihood to find larger complexes more than once increases. However, the larger homogeneity is reflected by a larger PP. In summary, for the scenarios that we have simulated PPproved to be a very good descriptor of the protein abundance distribution. Cell Simulations

More Related