1 / 34

Prof. Dr. Tim Beissbarth University of Göttingen Christian Bender

Prof. Dr. Tim Beissbarth University of Göttingen Christian Bender German Cancer Research Center (DKFZ) ‏ Department Molecular Genome Analysis. CAMDA 2008: Extending pathways with inferred regulatory interactions from microarray data and protein domain signatures. DKFZ-B050 INF 580

keona
Download Presentation

Prof. Dr. Tim Beissbarth University of Göttingen Christian Bender

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Dr. Tim Beissbarth University of Göttingen Christian Bender German Cancer Research Center (DKFZ)‏ Department Molecular Genome Analysis CAMDA 2008:Extending pathways with inferred regulatory interactions from microarray data and protein domain signatures DKFZ-B050 INF 580 D-69120 Heidelberg +49 6221 42 4716 t.beissbarth@dkfz.de c.bender@dkfz.de

  2. Contents • Introduction • The CAMDA Dataset: Apoptosis time-course of Affara 2007 • Our goals • Methods • KEGG pathway prediction using InterPro domain signatures • Overrepresentation of predicted pathways • Dynamical Bayesian Networks • Verification of inferred regulations by Ingenuity • Results • Analysis of differential expression and prediction of KEGG pathway memberships with InterPro domain signatures • Overrepresentation test of pathways • Reconstruction of gene regulatory networks with G1DBN • Comparison to Ingenuity based literature network • Summary/Outlook

  3. 1a) Introduction: The CAMDA Dataset • Questions: • What is the kinetics of the transriptome change during EC apoptosis? • What is the role of EC apoptosis in blood vessel development? • Experimental setup: as described in Affara et.al. 2007 • Expose primary human umbilical vein EC (HUVEC) to partial survival factor deprivation (SFD) 3 replicates

  4. 1b) Introduction: Our goals • Extract differentially expressed genes during the timecourse • Predict membership of genes to corresponding KEGG pathways via InterPro domain signatures and find overrepresented pathways • Reconstruct gene regulatory networks of the differentially expressed genes in these pathways • Compare the learned networks to literature knowledge InterPro Protein

  5. 2a) Motivation of pathway prediction: Functional Characterization of Genes Pathway annotation often not available • Only ~25% of all genes (~4,000 / ~20,000) have an annotation in the KEGG database (Kanehisa et al., 2008) • Molecular function (e.g. kinase) • Cellular localization (e.g. nucleus) • Biological processes (e.g. apoptosis) • Pathways (e.g. MAPK signaling) • ...

  6. 2a) Idea: predict pathway depending on the domain signature • Proteins in the same pathway should be somehow similar: • Common domains • Domain information of proteins can be retrieved from InterPro database • Information available for ~90% • Use InterPro domains to predict pathway membership gene-wise Gene X InterPro domain signature Classification model KEGG pathway

  7. 2a) Why should this work?

  8. 2a) The KEGG Ontology KEGG KEGG 01100 Metabolism 01100 Metabolism 01200 Genetic Inf. Processing 01200 Genetic Inf. Processing 01300 Envir. Inf. Processing 01300 Envir. Inf. Processing 01400 Cellular Processes 01400 Cellular Processes 01320 Signal Transduction 01320 Signal Transduction 01310 Membrane Transport ... ... ... 04010 MAPK pathway 04010 MAPK pathway Gene X

  9. 01100 01200 01300 01400 01310 ... 04010 2a) Hierarchical Classification Model • Classifier should take into account KEGG hierarchy • Predicting something wrong at the top hierarchy (e.g. Metabolism vs. Envir. Inf. Proc.) is worse than confusing e.g. MAPK with WNT pathway. • A specific gene can play a rule in multiple pathways! • => Solution: encoding into a specific loss function y = 0 0 1 0 1 ... 1 u = 1 0 1 0 1 ... 0

  10. 2a) Hierarchical Classification Model x = 1 0 1 0 1 ... 1 KEGG Here or other? 01100 Metabolism 01100 Metabolism 01200 Genetic Inf. Processing 01200 Genetic Inf. Processing 01300 Envir. Inf. Processing 01300 Envir. Inf. Processing 01400 Cellular Processes 01400 Cellular Processes Here or other? 01320 Signal Transduction 01310 Membrane Transport 01310 Membrane Transport ... ... Here or other? 04010 MAPK pathway ... 04010 MAPK pathway

  11. 2b) Overrepresentation test • Fisher‘s Exact Test • Groups: • Differential/not differential • Predicted path p/ not predicted as path p • Example: • Test for probability of seeing 22 or more differential genes in path p by chance • Done for sets: • pathways already annotated in KEGG • Pathways already annotated in KEGG or predicted by DS

  12. Xt-11 Xt-12 Xt-13 G1 Xt1 Xt2 Xt3 G2 Xt+11 Xt+12 Xt+13 G3 2c) Dynamical Bayesian networks (DBN) DBN G: Regulation motif: Microarray data: ≙ Lebre et. al. 2008

  13. 2d) Finding interactions with Ingenuity We use this information to verify the reconstruction quality of our network Given a list of genes, which interactions are already known? Example: What is known about Cyclins D and E, CDK 4 and 6 and Rb? Ingenuity network:

  14. 3a. Workflow for the analysis 18310 informative genes 10630 genes to predict Limma analysis Smyth et. al 2004 3385 in KEGG 268 differential 7591 via KEGG and DS 651 differential 18310 informative genes 1002 differentially expressed Fisher's exact test for each pathway Fisher's exact test for each pathway Pathway prediction via DS Fröhlich et. al. 2004 Construct regulatory network for genes in significantly overrrepresented pathways by DBN

  15. 3b) Overrepresentation results • Pathways Cell Cycle, Metabolism, Cell Growth and Death significantly overrepresented within the DS prediction group at a 0.05 level • Since we were analyzing an apoptosis study, we chose the pathway cell cycle for further experiments

  16. 3. Selected gene profiles for DBN reconstruction • Genes predicted in pathway cell cycle • Only those that were found as differentially expressed

  17. 2c) For verification: Ingenuity network for our selected genes

  18. 4. Inferred network

  19. 4) Cellcycle pathway from KEGG

  20. 4) Howto integrate the inferred net into the pathway

  21. 4. Some details on inferred interactions • NASP -> PLK1 • NASP: H1 histone binding protein that is involved in transporting histones into the nucleus of dividing cells • NASP -> MCM • MCM: highly conserved mini-chromosome maintenance proteins that are essential for the initiation of eukaryotic genome replication • UACA -> BUB1 • UACA: regulates morphological alterations required for cell growth and motility • BUB1: gene encodes a kinase involved in spindle checkpoint function

  22. 5. Summary • Combine pathway prediction and network inference • Find differential genes, predict pathways and find overrepresented pathways • Compute regulatory network from the expression data • integrate the inferred interactions in known pathway maps • Uses two freely available R-packages • gene2pathway • G1DBN

  23. Acknowledgements • Molecular Genome Analysis • Biostatistics & Modelling • Christian Bender • Holger Fröhlich • Marc Johannes Department head Annemarie Poustka (MGA – Department Head)‏ Died May 2008 Stefan Wiemann acting head of division MGA

  24. 3b. Overrepresentation of pathways given: List of genes with corresponding pathway annotation or prediction • test for overrepresentation of all pathways: • Fisher's exact test • once for genes being annotated in KEGG • once for genes with additional pathway membership prediction by DS

  25. 3a. Differential expression and pathway prediction Use R-package limma to find differential genes Results in 1002 differentially expressed genes during the timecourse Map all genes on the UniSet Array to KEGG pathways via InterPro domain signatures (DS)‏ for 10630 genes InterPro DS could be found 3385 already annotated in KEGG, 268 differential 4206 genes were assigend to kegg pathways via DS, 353 differential Smyth et al 2004

  26. 2b) Approach used in this work Example: Test if X_12 || X_31 | pa(X_12)‏ => yes But: consider only first order dependencies: X_12 || X_31 | pa(1)(X_12) = X_12 || X_31 | X_11 or X_12 || X_31 | X_21 => no So spurious edges can occur in G(1) which disappear in Gschlange. Idea: X11 X12 X21 X22 X31 X32 Lebre et. al. 2008

  27. 2a) Hierarchical Classification Model Domain signature x = 1 0 1 0 1 ... 1 SVMs Decision values f Taken from dictionary of possible position vectors Ranking Perceptron (loss function l) Melvin et al., 2007 Trainable weight vector Most probable pathways y = 0 0 1 0 1 ... 1

  28. 2a) Hierarchical Classification Model: Training Procedure Data set of domain signatures for each gene Train SVMs Position labeled data set Train ranking perceptron using loss function l weight vector z

  29. 2b) Dynamical Bayesian networks Gene 1 Gene 2 Gene 3 Gene p G Kim et. al. 2003

  30. 2b). Dynamical Bayes Networks, Lebre et. Al 2008 Define DAG Gfull as DAG that contains all edges between successive Variables: • Then it exists a minimal DAG describing a BN containing all conditional dependencies between variables in successive timepoints: • Any pair of successive variables (X_t-1^j, X_t^i) not adjacent in are conditionally independent given the parents: • This minimal BN description factorizes according to • It will be used to define low-order conditional dependence DAGs for the inference of

  31. Xt-11 Xt-12 Xt-13 Xt1 Xt2 Xt3 Xt+11 Xt+12 Xt+13

  32. Xt-11 Xt-12 Xt-13 Xt1 Xt2 Xt3 Xt+11 Xt+12 Xt+13 2c) Approach used in this work Goal: Infer DAG G´, containing full conditional dependencies Infer DAG G(1) , containing all 1st order dependencies Key observation: G´ ⊆ G(1) Derive G´ from G(1) Lebre et. al. 2008

  33. 3a. Workflow for the analysis 18310 informative genes 10630 genes to predict Limma analysis Smyth et. al 2004 3385 in KEGG 268 differential 7591 via KEGG and DS 651 differential 18310 informative genes 1002 differentially expressed Fisher's exact test for each pathway Fisher's exact test for each pathway Pathway prediction via DS Fröhlich et. al. 2004 Construct regulatory network for genes in significantly overrrepresented pathways by DBN

More Related