1 / 29

Network inference from repeated observations of node sets

Network inference from repeated observations of node sets. Neil Clark, Avi Ma'ayan. Network Inference. Protein-Protein interaction network. Cell signaling network. Overview. Network inference - the deduction of an underlying network of interactions from indirect data .

monita
Download Presentation

Network inference from repeated observations of node sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network inference from repeated observations of node sets Neil Clark, AviMa'ayan

  2. Network Inference Protein-Protein interaction network Cell signaling network

  3. Overview • Network inference - the deduction of an underlying network of interactions from indirect data. • A general class of network inference problem • Network inference approach • Application: • inference of physical interactions: PPI • Inference of gene associations: Stem cell genes • inference of statistical interactions: Drug/side effect network

  4. GMT files

  5. The inference problem • Input: a set of entities (genes or proteins or ...) in the form of a GMT file - the results of experiments, or sampling more generally. • Assumptions: • 1 An underlying network exists which relates the interactions between the entities in the GMT file • 2 Each line of the GMT file contains information on the connectivity of the underlying network • The problem: Given a GMT file can we extract enough information to resolve the underlying network?

  6. A synthetic example

  7. Approach... • Forget for the moment that we know the underlying network and pretend we only have the GMT file. • Attempt to use the accumulation of our course data to infer the fine details of the underlying network. • Consider the set of all networks that are consistent with our data - there are likely to be many. • Use an algorithm to sample this ensemble of networks randomly. • The mean adjacency matrix gives the probability of each link being present within the ensemble.

  8. Inference live!

  9. Information content

  10. Analytic Approximation • When applying this approach to real data typically there are large numbers of nodes • Sample space of networks can be very large -> computationally demanding • Write a simple analytical approximation which mimics the action of the algorithm.

  11. Compare analytic approximation

  12. Correction for sampling bias • Destroy any information by a random permutation of the GMT file and compare the actual edge weight to the distribution of edge weights from the randomly permuted GMT files:

  13. Application to Infer PPIs MalovannayaA et al. Analysis of the human endogenous coregulatorcomplexome. Cell. 2011 May 27;145(5):787-99

  14. PPI network

  15. Validataion • Compare inferred PPI network to the following databases: • BioCarta • HPRD PPIInnateDB • IntAct • KEGG • MINT mammalia • MIPS • BioGrid

  16. Comparison

  17. Validation

  18. Validation

  19. Application to stem cells • We used two types of high-throughput data from the ESCAPE database (www.maayanlab.net/ESCAPE). • Chip X data: from Chip-Chip and Chip-seqexperiments. • 203,190 protein DNA binding interactions in the proximity of coding regions from 48 ESC-relevant source proteins. • Logof followed by microarray data: A manually compiled database of Protein-mRNA regulatory interactions deriving from loss-of-function gain-of-function followed by microarray profiling. • 154,170 interactions from 16 ESC-relevant regulatory proteins from loss-of-function studies, and 54 from gain-of-function studies.

  20. Chip X network

  21. Logof network

  22. Combining networks • Each data source gives a different perspective on the associations between the genes • New insights may possibly be gained by combining the different perspectives. e.g. small but consistent associations across different perspectives will be revealed by the enhanced signal-to-noise ratio.

  23. Combination of Chip X and Logof

  24. An extension of the approach...

  25. Application II: Inference of Network of statistical relationships in AERS database • Adverse Event Reporting System (AERS) database contains records of .... AERS Record 1 Drug 1, Drug 2, ... Side-effect 1, Side-effect 2, ... AERS Record 2 Dug 3, Drug 4, ... Side-effect 3, Side effect 4, ... ……

  26. AERS sub network

  27. AERS Large-scale Adjacency Matrix

  28. And finally…

  29. Summary • We described a general class of problem in network inference. • A network of physical interactions between proteins is inferred based on high-throughput IP/MS experiments • The method has been applied to examine associations between stem-cell genes from multiple perspectives • We have begun to apply the approach to the inference of statistical interactions between drugs and side-effects based on the AERS database • More details can be found on the website www.maayanlab.net/S2N

More Related