1 / 26

Merging network patterns: a general framework to summarize biomedical network data

Merging network patterns: a general framework to summarize biomedical network data. Yang Xiang, David Fuhry, Kamer Kaya, Ruoming Jin, Umit V. Catalyurek, Kun Huang Network Modeling Analysis in Health Informatics and Bioinformatics. Introduction.

dagan
Download Presentation

Merging network patterns: a general framework to summarize biomedical network data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Merging network patterns: a general framework to summarize biomedical network data Yang Xiang, David Fuhry, Kamer Kaya, Ruoming Jin, Umit V. Catalyurek, Kun Huang Network Modeling Analysis in Health Informatics and Bioinformatics

  2. Introduction • Identifying network patterns is an important task in bioinformatics. • Summarization is frequently needed because we often find more patterns than we can focus on. • In this work, we propose merge network patterns, a method that achieves both goals.

  3. Related work: Network Partition • Clustering • Coclustering (biclustering) • Graph partitioning Figure source: Ronhovde et. al., Detection of hidden structures for arbitrary scales in complex physical systems, Scientific Report 2, 2012 http://www.nature.com/srep/2012/120329/srep00329/fig_tab/srep00329_F1.html

  4. Related work: Clique or Biclique generation • Listing all maximal cliques:Bron-Kerbosch algorithm • Listing all maximal bicliques:Frequent closed itemsets + Supporting transactions • Problem: Too many

  5. Related work: Pattern summarization • Hyper and Hyper+ • Problems: • Still not compact enough • No guarantee on the quality of each discovered patterns

  6. Related work: Pattern growing • QCM, eQCM, and other variations • Problems: • Theoretically cannot guarantee covering important patterns. • Running time is too long (O(n5)) for large datasets

  7. NP-hard Problems

  8. NP-hard Problems

  9. Algorithm Framework • Minimize q • Maximum matching • Minimize q and then maximize overall density • Maximum weighted maximum cardinality matching • Maximize overall density with q=p-1 • choosing a pair which obtains the maximum density after the merge operation

  10. MultiMerge (Maximal Matching)

  11. SingleMerge

  12. Fast merging operation

  13. Performance study on merging unweighted bipartite network patterns • gene-phenotype dataset • All maximal bicliques • Compare the performance of MultiMerge and SingleMerge • Implemented in C++

  14. Running time of MultiMerge and SingleMerge algorithms for summarizing 1,000 to 10,000 patterns

  15. Number of summarized (outputted) network patterns by MULTIMERGE and SINGLEMERGE algorithms under various β values.

  16. Merging large number of network patterns • When the number of network patterns to be merged increases, SingleMerge and MultiMerge reach memory limitation before the running time becomes unacceptably long. • Why shall we do to handle millions of patterns? The answer is: Batch Processing

  17. Partial Results of merging gene-phenotype datasets

  18. Application study on merging weighted network patterns • Gene coexpression network (Spearman Correlation) built on the microarray dataset GSE 2034. • Backbone threshold: 0.6

  19. Backbone and merge 4 1 2 Threshhold=3 1 3 1 3 1 5 2 backbone merge

  20. Results • Clique mining results: 633,725 cliques • Merging results (β=0.7): 1,130 networks • Passing survival tests (GSE2034, GSE1456, NKI, NKI ER-Neg, NKI LN-Pos): 242 networks

  21. Survival test: LN-Positive

  22. Survival test: ER-Negative

  23. Macro patterns • Either setting a low β, or specifying the number of macro patterns desired.

  24. Macro patterns: toppgene enrichment

  25. Workflow of network merging unweighted graphs Weighted graphs thresholding Clique or biclique mining algorithms Cliques or bicliques merging Micro patterns Survival tests enrichment merging Macro patterns

  26. Thanks Questions?

More Related