1 / 18

DEMON A Local-first Discovery Method For Overlapping Communities

DEMON A Local-first Discovery Method For Overlapping Communities. Michele Coscia 1 , Giulio Rossetti 2,3 , Fosca Giannotti 2 , Dino Pedreschi 2,3 1 Harvard Kennedy School, Cambridge, MA, US michele_coscia@hks.harvard.edu

kalkin
Download Presentation

DEMON A Local-first Discovery Method For Overlapping Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DEMONA Local-first Discovery Method For OverlappingCommunities Michele Coscia1, Giulio Rossetti2,3, Fosca Giannotti2, Dino Pedreschi2,3 1 Harvard Kennedy School, Cambridge, MA, US michele_coscia@hks.harvard.edu 2 ISTI - CNR KDDLab, Pisa, Italy {fosca.giannotti, giulio.rossetti}@isti.cnr.it 3 Computer Science Dep., University of Pisa, Italy pedre@di.unipi.it KDD 2012, Beijing, August 14th 2012

  2. Outline • Communities and complex networks • A matter of perspective • DEMON Algorithm • Properties • Experiments • Evaluation • Future Works & Conclusions

  3. Communities • Communities can be seenas the basicbricks of a network • In simple, small, networks itis easy to identifythem by lookingat the structure..

  4. …butreal world networks are not “simple” • Wecan’tidentifyeasilydifferentcommunities • Too manynodes and edges

  5. Are theytwodifferentphenomena? No!

  6. A Matter of Perspective • The only difference is in the scale • Locally, for each node the structure makes sense • Globally, we are tangled in the complex overlap Idea: a bottom-up approach!

  7. Outline • Communities and complex networks • A matter of perspective • DEMON Algorithm • Properties • Experiments • Evaluation • Future Works & Conclusions

  8. Reducing the complexity Real Networks are Complex Objects Can wemakethem “simpler”? Ego-Networks (networks buildedupon a focalnode, the "ego”, and the nodes to whomegoisdirectlyconnected to plus the ties, ifany, among the alters)

  9. DEMON Algorithm • For eachnoden: • Extract the Ego Network of n • Removen from the Ego Network • Perform a Label Propagation1 • Insertn in each community found • Update the raw community set C • For eachraw community c in C • Merge with “similar” ones in the set (given a threshold)(i.e. merge iff at most the ε% of the smaller one is not included in the bigger one) 1 UshaN. Raghavan, R ́eka Albert, and SoundarKumara. Near linear time algorithm to detect community structures in large-scale networks. PhysicalReview E

  10. Twoniceproperties • Incrementality:Given a graph G, an initialset of communities C and an incremental update ∆G consisting of new nodes and new edgesadded to G, where ∆G contains the entire ego networks of all new nodes and of all the preexistingnodesreached by new links, then DEMON(∆G∪ G,C) = DEMON(∆ G, DEMON(G,C)) • Compositionality:Consideranypartition of a graph G intotwosubgraphs G1, G2 suchthat, for anynode v of G, the entire ego network of v in G isfullycontainedeither in G1 or G2. Then, given an initial set of communities C: DEMON(G1∪ G2,C) = Max(DEMON(G1,C), DEMON(G2,C)) Thosepropertymakes the algorithmhighlyparallelizable: itcan runindependently on differentfragments of the overall network with a relativelysmall combination work

  11. Experiments • Networks (with metadata): • Congress (nodes US politicians, connected if they co-sponsor the same bills) • IMDb(nodes Actors, connected if they play in the same movies) • Amazon (nodes Products, connected if they were purchased together) • Compared Algorithms: • Infomap, non-overlapping state-of-the-art • Rosvall and Bergstrom “Maps of random walks on complex networks reveal community structure”, PNAS, 2008 • HLC, overlapping state-of-the-art • Ahn, Bagrow and Lehmann “Link communities reveal multiscale complexity in networks”, Nature, 2010

  12. Quality Evaluation – Community size • number of communities • average community size Amazon

  13. Quality Evaluation - Label Prediction • MultilabelClassificator(BRL, Binary Relevance Learner) • Community memberships of a node as known attributes, real world labels (qualitative attributes) target to be predicted; Congress IMDb

  14. Quality Evaluation - Community Cohesion • How goodisour community partition in describingreal world knowledgeabout the clusteredentities? • “Similarnodes share more qualitative attributesthan dissimilar nodes” Iff CQ(P)>1 we are groupingtogethersimilarnodes

  15. Outline • Communities and complex networks • A matter of perspective • DEMON Algorithm • Properties • Experiments • Evaluation • Future Works & Conclusions

  16. Future works • Extension to weighted and directed networks (completed) • Parallelimplementation • Modify the mergingstrategy(in progress) • Hierarchicalmerging • … • Framework structure • i.e. differenthostedalgorithmsthat can be usedin place of LP to extractcommunities (according to differentdefinitions)

  17. Conclusions • DEMON approaches the community discovery problem trough the analysis of simpler structures (ego-networks) • The proposed algorithm outperforms state-of-the-art methodologies • Possible parallel implementation: high scalability

  18. Thanks! Questions? Code available @ http://kdd.isti.cnr.it/~giulio/demon/

More Related