210 likes | 231 Views
Explore cellular network models, learn inference and probabilistic models, study subnetworks, and understand Bayesian networks and Bayesian inference in cellular networks. Discover the lac Operon and representation techniques.
 
                
                E N D
Representation, Learning and Inference in Models of Cellular Networks BMI/CS 576 www.biostat.wisc.edu/bmi576/ Colin Dewey cdewey@biostat.wisc.edu Fall 2010
Various Subnetworks within Cells • metabolic: describe reactions through which enzymes convert substrates to products • regulatory (genetic): describe interactions that control expression of particular genes • signaling: describe interactions among proteins and (sometimes) small molecules that relay signals from outside the cell to the nucleus • note: these networks are linked together and the boundaries among them are not crisp
Figure from KEGG database gene products other molecules
Part of the E. coli Regulatory Network Figure from Wei et al., Biochemical Journal 2004
A Signaling Network Figure from Sachs et al., Science 2005
Two Key Tasks • learning: given background knowledge and high-throughput data, try to infer the (partial) structure/parameters of a network • inference: given a (partial) network model, use it to predict an outcome of biological interest (e.g. will the cells grow faster in medium x or medium y?) • both of these are challenging tasks because typically • data are noisy • data are incomplete – characterize a limited range of conditions • important aspects of the system not measured – some unknown structure and/or parameters
Transcriptional Regulation Example: the lac Operon in E. coli E. coli can use lactose as an energy source, but it prefers glucose. How does it switch on its lactose-metabolizing genes?
The lac Operon: Repression by LacI lactose absent  protein encoded by lacI represses transcription of the lac operon
The lac Operon: Induction by LacI lactose present  protein encoded by lacI won’t bind to the operator (O) region
The lac Operon: Activation by Glucose glucose absent  CAP protein promotes binding by RNA polymerase; increases transcription
Network Model Representations • directed graphs • Boolean networks • differential equations • Bayesian networks and related graphical models • etc.
Probabilistic Model of lac Operon • suppose we represent the system by the following discrete variables L (lactose) present, absent G (glucose) present, absent I (lacI) present, absent C (CAP) present, absent lacI-unbound true, false CAP-bound true, false Z (lacZ) high, low, absent • suppose (realistically) the system is not completely deterministic • the joint distribution of the variables could be specified by 26× 3 - 1 = 191 parameters
Motivation for Bayesian Networks • Explicitly state (conditional) independencies between random variables • Provide a more compact model (fewer parameters) • Use directed graphs to specify model • Take advantage of graph algorithms/theory • Provide intuitive visualizations of models
L I G C lacI-unbound CAP-bound Z A Bayesian Network for the lac System Pr ( L ) Pr ( lacI-unbound | L, I ) Pr ( Z | lacI-unbound, CAP-bound )
Bayesian Networks • Also known as Directed Graphical Models • a BN is a Directed Acyclic Graph (DAG) in which • the nodes denote random variables • each node X has a conditional probability distribution (CPD) representing P(X | Parents(X)) • the intuitive meaning of an arc from X to Y is that X directly influences Y • formally: each variable X is independent of its non-descendants given its parents
L I G C lacI-unbound CAP-bound Z Bayesian Networks • a BN provides a factored representation of the joint probability distribution • this representation of the joint distribution can be specified with 20 parameters (vs. 191 for the unfactored representation)
Pr( D | A, B,C ) A F T Pr(D = T) = 0.9 B F T Pr(D = T) = 0.5 C F T Pr(D = T) = 0.8 Pr(D = T) = 0.5 Representing CPDs for Discrete Variables • CPDs can be represented using tables or trees • consider the following case with Boolean variables A, B, C, D Pr( D | A, B,C )
Representing CPDs for Continuous Variables U1 U2 … Uk • we can also model the distribution of continuous variables in Bayesian networks • one approach: linear Gaussianmodels X • X normally distributed around a mean that depends linearly on values of its parents ui
The Inference Task in Bayesian Networks L Given: values for some variables in the network (evidence), and a set of query variables Do: compute the posterior distribution over the query variables • variables that are neither evidence variables nor query variables are hidden variables • the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables I G C lacI-unbound CAP-bound Z
The Parameter Learning Task • Given: a set of training instances, the graph structure of a BN • Do: infer the parameters of the CPDs • this is straightforward when there aren’t missing values, hidden variables L I G C lacI-unbound CAP-bound Z
The Structure Learning Task • Given: a set of training instances • Do: infer the graph structure (and perhaps the parameters of the CPDs too)