1 / 35

reviewed by PhilHyoun Lee BioInformatics System Laboratory Department of BioSystems, KAIST, KOREA

CS 774B Topics in AI Machine Learning: Theory and Practice Learning Genetic Regulation Networks from mRNA Expression Data. reviewed by PhilHyoun Lee BioInformatics System Laboratory Department of BioSystems, KAIST, KOREA. Outline. Introduction Overview of Previous Approaches Clustering

marisa
Download Presentation

reviewed by PhilHyoun Lee BioInformatics System Laboratory Department of BioSystems, KAIST, KOREA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 774B Topics in AI Machine Learning: Theory and PracticeLearning Genetic Regulation Networks frommRNA Expression Data reviewed by PhilHyoun Lee BioInformatics System Laboratory Department of BioSystems, KAIST, KOREA

  2. Outline • Introduction • Overview of Previous Approaches • Clustering • Boolean Network • Differential Equation • Bayesian Network Approaches • Bayesian network • Freidman et al. 2000 – first Bayesian network modeling • Pe’er et al. 2001 – infer the causality • Hartemink et al. 2002 - combine location information • Summary

  3. Part 1. IntroductionGenetic Regulatory NetworkmRNA Gene Expression Data

  4. muscle cell neuron cell blood cell A central goal of biology is to understand the regulation of protein synthesis & its reactions to external and internal signals

  5. Genetic Regulatory Network Cytoplasm cis sites Transcription Factors Intracellular Signaling Genetic Regulatory Network mRNA Translation + processing Nucleus Receptors Ion Channels Extracellular space Ligands ELECTROPHYSIOLOGY the set of mutually activating and repressing genes and gene products and their interactions

  6. Genetic Regulatory Network

  7. Gene Expression overall process of genetic information flows from genes to proteins mRNA Data from cytoplasm Genetic Regulatory Junctions

  8. mRNA Expression Data Format From cDNA microarray N X P matrix • 0 < ratio < Inf. • Inf. < log2(ratio) < + Inf. • where • log2(ratio) > 0: increase • log2(ratio) < 0: decrease

  9. Problem Definition Difficulty in Reconstructing Genetic Regulatory Network 1. mRNA expression is only a partial picture 2. the number of sample is much smaller than the number of genes 3. high noise Gene 4 Gene 2 Gene 3 Gene 1 Gene 5 Gene 6 Genetic regulation network Microarray data

  10. Part 2. Previous approachesI. Clustering II. Boolean Network II. Differential Equation

  11. I. Clustering • Grouping genes with similar patterns of expression Common role gene clustered together Regulation and interaction pattern inferred Uncharacterized gene function guessed Similarity measure : standard correlation coefficient, .. Method : Hierarchical clustering, K-means, SOM .. Can’t reveal the inner interaction structure !

  12. Ex1 Ex2 Ex3 Ex4 Ex5 Ex6 Gene 1 1 1 0 0 0 0 Gene 2 1 1 1 0 0 0 Gene 3 0 1 1 1 0 0 110 111 011 001 000 II. Boolean Network • Boolean network is a Graph consist of G(V, F) V is a set of nodes ( genes ) as x1 , x2, …, xn F is a list of Boolean functions f(x1 , x2, …, xn) Gene expression data is quantized to 1 (Active) and 0 (Inactive) X2 = X1 ∧ X2 Trajectory : series of state transition Attractor : a set of states that repeat itself in a fixed sequence Similar semantics with biological phenomenon such as cell cycle, differentiation But too unrealistic assumptions !

  13. Dynamic System for Gene Expression n The number of genes in genome r mRNA concentration (n-dim vector) p Protein concentration (n-dim vector) f(p) Transcription function L Translational constant n X n non-degenerate diagonal matrix V Degradation rates of mRNA n X n non-degenerate diagonal matrix U Degradation rates of Proteins n X n non-degenerate diagonal matrix change in mRNA concentration change in protein concentration III. Differential Equation • Linear Transcription Model • Assume Transcription function f(p) is a linear functions of p, f(p) = Cp • Let x = (r, p)T, M be a 2n X 2n transition matrix, gene expression can be modeled

  14. Part 3. Bayesian NetworkI. Bayesian NetworkII. Freidman et al. 2000 III. Pe’er et al. 2001IV. Hartemink et al. 2002

  15. I. Bayesian Network - Definition Gene B Gene A Gene D Gene C Gene E P( A, B, C, D, E ) = ∏ P ( Xi | Parent(Xi) ) = P(A) P(B) P(C|A,B) P(D|B) P(E|D) Probabilistic framework for robust inference of interactions in the presence of noise • G: a directed-acyclic graph structure • : a set of parameters for conditional distribution of each variable

  16. B A Structure Learning ! C I. Bayesian Network - Process P(A,B,C) = P(A) P(B) P(A|C) Independence Data Expression data (N X P matrix)

  17. B B B B B B A A A A A A C C C C C C S(G:D) = 79 S(G:D) = 56 S(G:D) = 86 S(G:D) = 76 S(G:D) = 64 S(G:D) = 56 I. Bayesian Network - Structure Learning • Heuristic Search Approaches • greedy-hill climbing, simulated annealing etc • Model selection • select a good model • Selective model averaging • select a number of good models and pretend these models are exhaustive

  18. X and Z are conditionally independent given Y partially directed graph (PDAG) Z X Y Y Y Y X X X Z Z Z I. Bayesian Network– Structure Learning Independence Equivalent Class iff they have the same v-structure ignoring arc direction Ordered tuple(X,Y,Z) such that there is an arc from X to Y and from Z to Y, but no arc between X and Z We can’t distinguish between equivalent graphs

  19. prior likelihood S(G:D) = log p(D, Sh) = log p(Sh) + log p(D|Sh) From the chain rule of probability Likelihood log p(D|Sh) = ∑ log p(xi | pa(xi), Sh) • Assuming equal priors on structure, • I. Model with the highest log likelihood is a model that is the best predictor of the data D • II. Score can use local criteria ∑ Slocal(Xi, Pa(Xi), D) • and is same for members of equivalent classes I. Bayesian Network – Structure Learning Get the score for each network with respect to the training data

  20. II. Using Bayesian Networks to Analyze Expression Databy Friedman et al. 2000. S.Cerevisiae Cell-Cycle data by spellman (1998) Discretization Bayesian Network Structure Learning Feature Estimation : Bootstrap method Feature Analysis

  21. II. Using Bayesian Networks to Analyze Expression Data by Friedman et al. 2000. • Data 800 X 76 data varied over the different cell-cycle stages (mainly 250 cell cycle regulated genes, trans-acting factors) • Discretization 3 categories: -1, 0, and 1 (threshold value of 0.5 in log2 scale) • Structure Learning algorithm Sparse Candidate Algorithm ( identify a small number of candidate parents for each gene based on local statistics such as correlation )

  22. II. Using Bayesian Networks to Analyze Expression Data by Friedman et al. 2000. • Feature estimation: extract useful features Markov relation iff there is either an edge between them, or both are parents of another variable two genes are related in some joint biological interaction or process Order relation A is an ancestor of B in all the equivalent Bayesian networks learned transcription of one gene is a direct cause of the transcription of another gene Dominant gene potential causal sources of cell-cycle process 200 fold Bootstrap Approach generate “perturbed” versions of original data set and learn from them

  23. II. Using Bayesian Networks to Analyze Expressio n Data by Friedman et al. 2000. • Summary • The first Bayesian network model • Focus is on extracting features, not the whole network structure • Future work • Deal with continuous data • Incorporate biological knowledge • Applying Dynamic Bayesian Networks to temporal expression data • Improve discretization method • Discover causal patterns

  24. III. Inferrring Subnetworks from Perturbed Expression Profilesby Pe’er et al. 2001. S.Cerevisiae Perturbation data by Hughes et al (2000) Discretization Bayesian Network Structure Learning Feature Estimation : Bootstrap method Subnetwork Analysis

  25. A A B B No effect to gene A Effect to gene B III. Inferrring Subnetworks from Perturbed Expression Profilesby Pe’er et al. 2001. • Data 565 X 300 ( including mutated genes and genes of significant change in at least 4 profiles ) Ideal intervention gene assigined a specific value => Gene deletion, over-expression mutants Others are modeled as indicator variable, constrained to be roots => Temperature sensitive, kinetic mutation and external treatment

  26. III. Inferrring Subnetworks from Perturbed Expression Profiles by Pe’er et al. 2001. • New Feature – Activation, Inhibition Let Xis one ofPa(Y) and U = Pa(Y) – {X} X activates YIf P(Y=1 | X, U) increases when X increases and U is fixed X inhibits YIf P(Y=-1 | X, U) increases when X increases and U is fixed • Sub-network construction select a threshold ts (=0.75) of significant confidence find maximal connected components each component (more than 3 variables) is a seed expand seed with variables by a confidence t’ (t’< ts, t’=0.5)

  27. III. Inferrring Subnetworks from Perturbed Expression Profiles by Pe’er et al. 2001. • Summary • Can identify causality relationship from data • New types of features • Focus is on extracting sub-networks, not the whole network • Future work • Incorporate biological knowledge • Identifying latent factors that interact with several observed genes

  28. IV. Combining Location and Expression data for Principled Discovery of Genetic Regulatory Network Modelsby Hartemink et al. 2002. S.Cerevisiae Mating data by Hartemink et al (2002) Discretization Structure Learning with Prior knowledge Feature Estimation : Model Averaging Compare with non-constraint one

  29. STE12 FUS1 MCM1 IV. Combining Location and Expression data for Principled Discovery of Genetic Regulatory Network Modelsby Hartemink et al. 2002. • Expression Data 32 * 320 under a diversity of experimental conditions => pheromone response signaling or mating related genes is discretized into 4 stages Location Data find the upstream regions where a specific transcription factor binds using a chromatin immunoprecipitation assay • Incorporate genomic location data to guide the model structure non-uniform prior over structures that gives zero weight to models with required edges absent

  30. IV. Combining Location and Expression data for Principled Discovery of Genetic Regulatory Network Modelsby Hartemink et al. 2002. • Model Averaging gather 500 highest scoring models Compute the probability of edge using weighted average approximation • Draw the result network edges are included if their posterior probability > 0.5

  31. IV. Combining Location and Expression data for Principled Discovery of Genetic Regulatory Network Modelsby Hartemink et al. 2002. • Summary model reconstruction is unable from expression data alone edges indicate a statistical dependence between transcript level of genes, but not necessarily the form or presence of a physical dependence => location data is proved physical direct edge • Future work location data also could be noisy => relax the model prior 0 to small but positive weight

  32. Part 4. Summary • Bayesian network is suitable for genetic network reconstruction • Can deal with stochastic nature • Ideal for sparse domain (Useful for locally interacting components • Can handle noisy data • Missing data • Hidden variable (protein level, other molecules) • Inference reasoning • More research needed • To solve dimensionality problem => Incorporation of more biological information • To model feedback process => Dynamic Bayesian networks

  33. Reference • General • A bibliography on learning causal networks of gene interactions by Florian Markowetz, 2003 • Modeling and Simulation of Genetic Regulatory Systems: A Literature Review by Hidde De Jong, 2002 • Genetic Network Analysis – From Bench to computers and back by Zoltan Szallasi, 2001 • Modeling Transcriptional Control in Gene Networks-Methods, Recent Results, and Future Directions by Paul Smolen et al. 2000 • Boolean Network • Identification of Genetic Networks from a small number of gene expression patterns under the boolean network model by Tatsuya et al. 1999. • REVEAL, a general reverse engineering algorithm for inference of genetic network architectures by Shoudan Liang, 1998.

  34. Reference • Differential Expression • Inferring Gene Regulator Networks from Time-Ordered Gene Expression Data Using Differential Equation by Michiel de Hoon et al. 2002. • Stability of Genetic Regulatory Network with Time Delay by Luonan chen et al. 2002. 3. Modeling Gene Expression with Differential Equations by Ting Chen et al. 1999. • Bayesian Network • Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection by Yoshinori et al. 2003. • Combining Location and Expression data for Principled Discovery of Genetic Regulatory Network Modelsby Hartemink et al. 2002. • Inferrring Subnetworks from Perturbed Expression Profiles by Pe’er et al. 2001. 4. Using Bayesian Networks to Analyze Expression Databy Friedman et al. 2000.

  35. Reference • Dynamic Bayesian Network • Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks by Dirk Husmeier et al. 2003. 2. Modeling regulatory pathways in E.coli from times series expression profiles by Irene M. Ong et al. 2002. 3. Evaluating functional network inference using simulations of complex biological systems by V. Anne Smith et al. 2002. • Modeling Gene Expression Data using Dynamic Bayesian Networks by Kevin Murphy et al. 2000. • Etc • Inference of Gene Regulatory Model by Genetic Algorithm by Shin Ando. 2001. • Gene Network Reconstruction Using a Distributed GA with a Backprop Local Search by Mark Cumiskey et al. 2002.

More Related