1 / 30

Reconstructing Biological Networks

Reconstructing Biological Networks. Brian Haynes. Biological Networks. Multiple layers Metabolic Protein-Protein Protein-DNA Other considerations Environment Tissue. Metabolic. Protein-protein. Protein-DNA. Network Reconstruction. An analogy. ATTACGTGTGC TGCCGATT GTGCTGCCA.

clint
Download Presentation

Reconstructing Biological Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconstructing Biological Networks Brian Haynes

  2. Biological Networks • Multiple layers • Metabolic • Protein-Protein • Protein-DNA • Other considerations • Environment • Tissue Metabolic Protein-protein Protein-DNA

  3. Network Reconstruction

  4. An analogy ATTACGTGTGC TGCCGATT GTGCTGCCA TGCCGATTACGTGTGCTGCCA Read assembly Genomic sequence Pathway inference Pathway

  5. Data Integration • Genome • Promoter sequence • Peptide homology • Transcriptome • Microarrays • RNA-seq • Proteome • 2d gel / MALDI-MS • Tandem mass-spec • Metabolome • LC / CE MS High throughput Low throughput

  6. Model Considerations • Interpretability • Data requirements • Steady state • Time course • Complexity • Descriptive or Predictive

  7. Correlation based models • Measure expression over many conditions and genetic perturbations • Find correlations between genes that persist across samples • Filter indirect interactions

  8. Calculating Expression Correlation • Pearson correlation: • Mutual Information: y x - Low correlation - High mutual information

  9. Filtering Indirect Interactions • DPI (data processing inequality) • Used in ARACNE algorithm • Filter weak alternative paths between nodes • In the strict application the graph becomes a tree g1 a b c g3 g2 (Margolin, et al, Bioinformatics, 2006)

  10. Filtering Indirect Interactions • Background correction • CLR (context likelihood relatedness) Z-Score: (Faith, et al, PLoS Biol, 2007)

  11. Filtering Indirect Interactions • Background correction • KNN filtering (Used in Symmetric-N algorithm) • Filtered structures are scale free given a sufficient sized K • Algorithm: • Rank correlations for each node • If an edge shared by two nodes is in both nodes top K, then keep the edge K = 2 (Chen, et al, BMC Bioinformatics, 2008)

  12. Assessing performance Performed in E-Coli using 445 microarrays using RegulonDB as the gold standard Precision: TP / ( TP + FP ) Recall: TP / (TP + FN ) (Faith, et al, PLoS Biol, 2007)

  13. Correlation based models • Strengths • Fast to compute < 1 minute ( poly-time ) • Low complexity • Weaknesses • Undirected structure (no causal information) • Descriptive, not predictive • Difficult to validate / interpret model • Can only test at the structural level

  14. Probabilistic Graphical Models • Graph • Vertices represent variables • Edges encode conditional dependence Given C, E is independent of its indirect ancestors A&B: Pr( E | C,A,B ) = Pr( E | C ) B A D C Pr( A,B,C,D,E ) = P(A) P(B) P(C | A,B) P(D) P(E | C, D) E

  15. Bayesian Networks • Directed graphical models • Acyclic Conditional Probability Table: B A P(C | A,B) D C E

  16. Dynamic Bayesian Networks • Intuitively unrolling of the static BN through time • Parameterization is time invariant B B C C E E T0 T1 T2 T3 T4 T5

  17. Learning Bayesian Networks • Given observations and possibly prior knowledge, learn the structure and parameters of the BN • Goal: maximize the posterior P( M | D ) B A D C E Parameter Learning Structure Learning

  18. Structure Learning MCMC sampling 0.9 0.1 0.4 P(x | a,b,c) Propose a network structure Find the optimal parameters for P(x | a,b,c) Accept or reject according to Metropolis-Hastings criterion

  19. Probabilistic Graphical Models • Strengths • Inherently handle noise • Low complexity (under certain CDFs) • Can support time dependencies (DBN) • Dealing with hidden variables • Weaknesses • With DBN, time invariance doesn’t support all time course data (ie data with non even sample intervals) • Computationally intensive to learn optimal structure (NP-Complete) • Continuous time models are underdeveloped

  20. Differential Equation Models • Attempt to reconstruct the dynamical system that produced the gene expression data • Reduce dimensionality of the data • Approximate dynamics • Modeled using ordinary differential equations • Restrict model complexity • Example system : The Inferelator

  21. Dimensionality Reduction • Regulators (genes and environment) • Limited to transcription factors • Factors with correlated profiles are merged • Genes • Clustered based on putative coregulation • Used cMonkey to form biclusters across genes and conditions [Bonneau, 2006] • Correlated expression • Shared regulatory sequence motifs (Bonneau, et al, Genome Biology, 2006)

  22. Model Details • Expression of y (gene or bicluster mean) is influenced by the expression of N regulators: X = (x1, x2, …, xN) (Bonneau, et al, Genome Biology, 2006)

  23. Model Details Choice of Squashing Function (Bonneau, et al, Genome Biology, 2006)

  24. Model Details Choice of Z: (Bonneau, et al, Genome Biology, 2006)

  25. Model Details Steady state Time course (Bonneau, et al, Genome Biology, 2006)

  26. Model Learning with LASSO • LASSO, a.k.a. L1 shrinkage S.T. (Bonneau, et al, Genome Biology, 2006)

  27. Results • Measured NRC-1 under 24 novel conditions and predicted expression response Training Data Test Data (24 novel conditions) (Bonneau, et al, Genome Biology, 2006)

  28. Differential Equation Models • Strengths • Predictive and hypothesis generating • Biological interpretation • Supports time course and steady state • Allows for uneven sampling of time course • Weaknesses • Arbitrary of the regulatory functions G and Z • Computationally intensive • Non-linearity causes problems in numerical optimization • Handling hidden variables

  29. Final Thoughts • Unifying the probabilistic and dynamical systems approaches • Handling genes by type • Moving from descriptive to predictive • Sorting hypotheses by information gain

More Related