1 / 53

2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

이학석사학위 청구논문. Inference of Gene Regulatory Network Using Regression Approach and Improvement of Boolean network Algorithm Using Chi-square Tests. 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong. Contents. Introduction Background and Motivation

nash
Download Presentation

2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 이학석사학위 청구논문 Inference of Gene Regulatory Network Using Regression Approach andImprovement of Boolean network Algorithm Using Chi-square Tests 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

  2. Contents • Introduction • Background and Motivation • Variable Selection Method In Boolean Networks • Overview • Method • Result • Regression Based Gene Regulatory Network Method • Overview • Method • Result • Discussion

  3. INTRODUCTION

  4. Background and Motivation Time T1 Treatment (time) Control T2 T3 … cDNA chip Tm Boolean Network Regression based Network log(R/G) Gene regulatory network

  5. Boolean Networks with Variable Selection Objective : Introduce a variable selection method to improvethe computing time in the Boolean networks

  6. X1 X2 X3 Boolean Networks • G(V,F) • V = {X1, X2,…,Xn} : set of nodes • Xi = 1 (on) if the ith gene is expressed • Xi = 0 (off) otherwise • F = {f1, f2, …, fn} : set of functions • fi(X1, X2, …, Xk) : Boolean function for the ith gene • k : indegree (number of input genes) • Wiring diagram, state transition graph

  7. Boolean Networks Example Truth table V={X1, X2, X3} F={f1, f2, f3} f1= X3 f2= X1 and X3 f3= not X2 X1 001 101 000 Cyclic attractor X2 X3 110 111 Wiring diagram State transition graph

  8. Advantages of Boolean Networks • Simple to use • Binarization to binary values reduces the noise level in experimental data • Pfahringer, 1995; Dougherty et al., 1995 • Shmulevich and Zang, 2002 • Represent the realistic complex biological phenomena • Cell differentiation, apoptosis, cell cycle (Huang, 1999) • Logical analysis of data (Boros et al., 1997) • human glioma (Shmulevich et al., 2003) • yeast transcriptional network (Kauffman et al., 2003)

  9. Boolean Network Algorithms • Infer Boolean functions from binary data • REVEL(reverse engineering) algorithm • Liang et al., 1998 • Mutual information • Simple networks can be calculated quickly • Identification (Consistency) problem • Akutus et al., 1998 • Best-fit Extension problem • Boros et al, 1998 • Shmulevich et al., 2002

  10. Construction of Boolean Networks Time T1 Treatment (time) Control T2 T3 … cDNA chip Tm Use log(R/G) Ratio data

  11. Construction of Boolean Networks Microarray ratio data Binarization Binary data 1 : gene is expressed 0 : gene is not expressed

  12. Construction of Boolean Networks (cont.) Binary data (n=4) Variable selection + REVEL algorithm (Somogy, 1998) Identification problem (Akutus, 1999) Consistency problem (Akutus, 1998) Best-Fit Extension problem (Boros, 1996) Boolean networks Boolean network algorithms V={X1, X2, X3, X4} F={f1, f2, f3, f4} f1= X4 f2= X1 f3= not X2 f4= X2and not X3 X1 X2 X3 X4

  13. Best Fit Extension Problem • Boros, 1998; Probabilistic Boolean networks (Shmulevich et al., 2002) One of the all possible combinations (n*nCk) Compare the observed X1 values and output values calculated from f1 Binary Data One of the all possible Boolean functions (22k-1) …. f1(X2,X3)=X2or X3 f1(X2,X3)=X2and X3 f1(X2,X3)=X2and not X3 …. Time t Time t-1 Error Error size = # of Error

  14. Computing Times of Boolean Network Total 4 genes, indegree k is 2 X1 X3 X2 X4 Number of combinations Total time complexity of Boolean network algorithm k : indegree, n : total genes, m : total time points

  15. BOOLEAN NETWORKS WITH VARIABLE SELECTION METHOD

  16. Chi-square Test for Variable Selection • Chi-square test • Binarization of the continuous gene expression values into {0 (not expressed), 1 (expressed)} • Produce two-way contingency tables • Perform the chi-square test for variable selection • Continuity correction (Agresti, 1994) • Add an arbitrary small number a to the each observed frequency to prevent some expected value from being zero

  17. Chi-square Test Chi-square test between every genes at time t and time t-1 using a two way contingency table Binary data Time t-1 Time t Total number of test

  18. Test Statistic and Variable Selection Criteria Chi-square statistic Selection criteria c is a criterion of variable selection

  19. Reduction of Searching Space Total 4 genes, indegree k=2, consider finding functions for node X1 Original Boolean network Variable selection Select X1, X2, X3 nodes at time t-1 It yields combinations combinations for X1

  20. BOOLEAN NETWORKS WITH VARIABLE SELECTION RESULT

  21. Simulation Data n=8, c=0.01, k=2, 10 time points No noise (Error size=0), 4 experiments X1 X8 X4 X3 X1 X2 X3 X4 X5 X6 X7 X8 1 0 0 0 0 1 1 1 1 2 0 0 1 0 0 1 1 1 3 0 0 1 1 0 0 1 1 4 0 0 1 1 1 0 0 1 5 0 0 1 1 1 1 0 0 6 1 0 1 1 1 1 1 0 7 1 1 0 1 1 1 1 1 8 0 1 0 0 0 0 1 1 9 0 0 1 0 0 0 0 1 10 0 0 1 1 0 0 0 0 X1 X2 X3 X4 X5 X6 X7 X8 1 0 1 1 0 0 1 0 0 2 1 0 1 1 0 0 1 0 3 1 1 0 1 1 0 0 1 4 0 1 0 0 0 0 0 0 5 1 0 1 0 0 0 0 0 6 1 1 0 1 0 0 0 0 7 1 1 0 0 0 0 0 0 8 1 1 0 0 0 0 0 0 9 1 1 0 0 0 0 0 0 10 1 1 0 0 0 0 0 0 X2 X5 X7 X6 f1 = not X8 f2 = X1 f3 = not X2 f4 = X3 f5 = X3 and X4 f6 = not X2 and X5 f7 = X6 f8 = X7 X1 X2 X3 X4 X5 X6 X7 X8 1 0 0 0 1 0 1 0 0 2 1 0 1 0 0 0 1 0 3 1 1 0 1 0 0 0 1 4 0 1 0 0 0 0 0 0 5 1 0 1 0 0 0 0 0 6 1 1 0 1 0 0 0 0 7 1 1 0 0 0 0 0 0 8 1 1 0 0 0 0 0 0 9 1 1 0 0 0 0 0 0 10 1 1 0 0 0 0 0 0 X1 X2 X3 X4 X5 X6 X7 X8 1 0 1 1 1 1 1 1 1 2 0 0 1 1 1 0 1 1 3 0 0 1 1 1 1 0 1 4 0 0 1 1 1 1 1 0 5 1 0 1 1 1 1 1 1 6 0 1 0 1 1 1 1 1 7 0 0 1 0 0 0 1 1 8 0 0 1 1 0 0 0 1 9 0 0 1 1 1 0 0 0 10 1 0 1 1 1 1 0 0

  22. Simulation Data time t-1 Variable selection (p-values) time t Computing time About 20 times faster

  23. Yeast Cell Cycle data • Data set • Yeast cell cycle (Spellman et al., 1998) • 18 time points • Randomly selected 50, 60 and 70 genes • Binarization : median • Boolean network program • C language, Best-Fit extension (Shmulevich, 2002) • Indegree k=3 and k=4 • Error size is 1, 2 • Variable selection • c = 0.1, 0.5

  24. Accuracy of Variable Selection Method • BFOBNis a set of Boolean functions which are found by using original Boolean network algorithm • BFVSBNis a set of Boolean functions which are found by using Boolean network algorithm with variable selection c=0.1 c=0.5 c=0.1 c=0.5

  25. Comparison of Computing Times 312.6h 41.1h 0.62h Boolean network algorithm with variable selection is 502.61 times fasterthan the original Boolean network algorithm when n=120, Error size=1, c=0.1 Boolean network algorithm with variable selection is 7.5 times fasterthan the original Boolean network algorithm when n=120, Error size=2, c=0.5

  26. Regression Based Network Method Objective : Infer gene regulatory network structure using linear regression approach

  27. Previous Works for Gene Regulatory Networks • Boolean networks • Kauffman 1969; Akutsu et al. 1998; Liang et al., 1998; Shmulevich et al., 2001 • Bayesian networks • Murphy, 1999; Friedman et al., 1999, 2000; Hartemink et al., 2001; Imoto et al., 2002 • Linear modeling • D'Haeseleer 1999; van Someren 2000 • Differential equations • Chen et al., 1999; D’Haeseleer et al., 1999; Von Dassow et al., 2000 • Structural equation model • Xiong et al., 2003; Xie and Bentler, 2003

  28. Drawbacks of Previous Works • Boolean networks • Loss of imformation without proper binarization. • Bayesian networks • DAG : Impossible to express autoregulation, cyclic relationship (Feed back)) • Hard computing time • Linear modeling • Parameters exceeds the number of time points • Differential equations • Parameters exceeds the number of time points • Previously known relationship • Structural equation model • Auto regulation, cyclic relationship (Feed back) • Previously known relationship

  29. (a). Correlations between slides (b). Correlations between selected 382 genes Causality of Gene Expression • Time lag • Caulobacter Crecentus (Laub et al., 2000) • 11 time points with 15min interval • Correlation of total 1444 genes (a) • Correlation of cell cycle related 382 genes (b) • Time lag = 1 (in this study)

  30. X1 X2 X3 Representation of Gene Regulatory Networks using Multiple regression Regression models Path diagram e1 b12 b13 b32 b23 e2 e3

  31. Feedforward loop Single input module Dense overlapping regulons X1 X1 X2 Xm … X1 (a) X2 Xn … X4 X5 … X2 X3 Xn X3 X1 (1) X4 X2 X1 X1 (1) X2 X1 (1) (2) X1 X5 X2 (2) X1 … (b) X1 X3 (2) X1 X3 (3) X1 X2 Xn (n1) … X2 (m) Xn X1 (n2) Xm Xn Network Motifs Brake down the networks into basic building block (Shen-Orr et al., 2002; Milo et al., 2002 ) E. coli, S. cerevisiae : Feedforward and Bi-fan motifs appear more than 10 SD greater than their mean number of appearances in randomize networks. (Nreal– Nrand)/ SD

  32. REGRESSION BASED GENE REGULATORY NETWORKS METHOD

  33. Simple Example G1 S G2 M G1 CLB1 SWI6 SWI5 CLB2 • SWI6 is Transcription cofactor, regulate transcription at the G1/S transition (Horak CE et al., 2002). • CLB1 and CLB2 are B-type cyclin that activates Cdc28p to promote the transition from G2 to M phase of the cell cycle (Lew DJ et al., 1997). • SWI5 is transcription factor that activates transcription of genes expressed in G1 phase and at the M/G1 boundary (Moll T et al., 1991)

  34. Step 1. Variable Definition G1 S G2 M G1 CLB1 SWI6 SWI5 CLB2 CLB2 CLB1 SWI5 SWI6 -2.360 -1.88 -1.290 -0.06 -0.273 -0.95 -0.700 -0.18 -1.960 -1.22 -0.330 -0.14 -2.290 -1.10 -0.880 -0.13 -1.360 -0.91 -0.190 0.34 0.400 -0.06 0.050 0.13 1.090 0.50 0.020 0.28 1.540 1.20 0.680 -0.03 1.500 1.11 0.750 -0.23 0.920 0.22 0.640 0.10 0.050 0.47 0.420 -0.35 -0.230 -0.02 -0.070 0.11 -0.420 -0.12 -0.790 0.08 -0.290 -0.12 -0.314 -0.16 0.120 0.42 -0.190 0.14 0.730 0.98 0.730 0.04 1.350 0.70 0.640 0.17 1.200 0.78 0.510 -0.09 Z =

  35. Step 1. Variable Definition (cont.) Time CLB2 CLB1 SWI5 SWI6 t0 -2.360 -1.88 -1.290 -0.06 t7 -0.273 -0.95 -0.700 -0.18 t14 -1.960 -1.22 -0.330 -0.14 t21 -2.290 -1.10 -0.880 -0.13 t28 -1.360 -0.91 -0.190 0.34 t35 0.400 -0.06 0.050 0.13 t42 1.090 0.50 0.020 0.28 t49 1.540 1.20 0.680 -0.03 t56 1.500 1.11 0.750 -0.23 t63 0.920 0.22 0.640 0.10 t70 0.050 0.47 0.420 -0.35 t77 -0.230 -0.02 -0.070 0.11 t84 -0.420 -0.12 -0.790 0.08 t91 -0.290 -0.12 -0.314 -0.16 t98 0.120 0.42 -0.190 0.14 t105 0.730 0.98 0.730 0.04 t112 1.350 0.70 0.640 0.17 (a) Time t-1 matrix X = Time CLB2 CLB1 SWI5 SWI6 t7 -0.273 -0.95 -0.700 -0.18 t14 -1.960 -1.22 -0.330 -0.14 t21 -2.290 -1.10 -0.880 -0.13 t28 -1.360 -0.91 -0.190 0.34 t35 0.400 -0.06 0.050 0.13 t42 1.090 0.50 0.020 0.28 t49 1.540 1.20 0.680 -0.03 t56 1.500 1.11 0.750 -0.23 t63 0.920 0.22 0.640 0.10 t70 0.050 0.47 0.420 -0.35 t77 -0.230 -0.02 -0.070 0.11 t84 -0.420 -0.12 -0.790 0.08 t91 -0.290 -0.12 -0.314 -0.16 t98 0.120 0.42 -0.190 0.14 t105 0.730 0.98 0.730 0.04 t112 1.350 0.70 0.640 0.17 t119 1.200 0.78 0.510 -0.09 (b) Time t matrix Y =

  36. Step 1. Variable Definition (cont.) Time t Time t SWI6 SWI5 CLB1 CLB2 SWI6 SWI5 CLB1 CLB2 Time t-1 Time t-1 Transition probability matrix Strength matrix

  37. Step 2. Fit Regression Model to Every Combination of Column in Matrix X Regression models for CLB2 (# of models : 4 + 4C2 = 10) Total : 4 x (4+4C2) = 40

  38. Step 3. Model selection Adjusted R-square > 0.5 b1 and b2 are both significant (significant level : 0.05) Selected regression models

  39. Step 4. Update Matrix N CLB2 CLB1 SWI5 SWI6

  40. Step 4. Update Matrix S CLB2 CLB1 SWI5 SWI6

  41. Step 5. Build Gene Regulatory Network Nij Nij xi yi xi yi Nij is not 0 and Sij < 0 Nij is not 0 and Sij > 0 kmax=3

  42. REGRESSION BASED GENE REGULATORY NETWORKS RESULT

  43. Yeast Cell Cycle • Time Series Microarray (Spellman et al., 1998) • Kmax=4 • SWI6 is transcription cofactor, forms complexes with DNA-binding proteins Swi4p and Mbp1p to regulate transcription at the G1/S transition • CLB1 and CLB2 both promote cell cycle progression into mitosis • SWI5 is transcription factor that activates transcription of genes expressed in G1 phase and at the G1/M boundary • A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p(Feldman RM, et al. (1997)) • CDC20 is require metaphase/anaphase transition; directs ubiquitination of mitotic cyclins, Pds1p.(Zachariae W and Nasmyth K, 1999) • PDS1 : Securin that inhibits anaphase by binding separin Esp1p, also blocks cyclin destruction and mitotic exit(Cohen-Fix O, et al. (1996)) • ESP1 : Separase with cysteine protease activity (related to caspases) that promotes sister chromatid separation by mediating dissociation of the cohesin Scc1p from chromatin; inhibited by Pds1p(Ciosk R, et al. (1998)) • CLN3 activate CLN1, CLN2 • CLB3,4,5,6 • Both CLB5 and CLB6 promoters contain MCB (MluI cell cycle box) motifs, which are elements found in several DNA synthesis genes. The transciptional activator MBF (MCB-binding factor), which is comprised of the Mbp1 and Swi6 proteins, bind to the MCB elements to activate transcription (Lew DJ, et al. (1997) ).

  44. Caulobacter crescentus Cell Cycle • Time series microarray Laub et al., 2000 • 553 identified cell cycle-regulated genes • Cluster genes by functional genes • 11 time points Laub, M.T., McAdams, H.H., Feldblyum, Fraser, C.M., and Shapiro, L. (2000) Global analysis of the genetic network controlling a bacterial cell cycle. _Science_, *290*, 2144-1248.

  45. Caulobacter crescentus Cell Cycle • CtrA controls the expression of many cell cycle-regulated genes (Wu et al., 1998; 1999; Jacobs et al., 1999; Quon et al., 1996; 1998; Kelly et al., 1998; Reisenauer et al., 1999; Skerker and Shapiro, 2000; Laub et al., 2002) • The mechanisms of signalling pathways that affect CtrA activity are not completely understood (Jacobs et al., 2004) • ccrM inhibits mRNA transcription by methylation of the GAnTC sequence (Reisenauer and Shapiro, 2002) Laub, M.T., McAdams, H.H., Feldblyum, Fraser, C.M., and Shapiro, L. (2000) Global analysis of the genetic network controlling a bacterial cell cycle. _Science_, *290*, 2144-1248.

  46. Flagella Biogenesis kmax=3

  47. DNA methylation kmax=3

  48. Cell division kmax=3

  49. Chemotaxis machinery kmax=3

  50. Chemotaxis machinery kmax=3

More Related