1 / 27

The Community-search Problem and How to Plan a Successful Cocktail Party

The Community-search Problem and How to Plan a Successful Cocktail Party. Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan , Jialiang Bao , Ge Wang. Introduction. Graphs are one of the most popular data representation Have a wide range of applications

jaron
Download Presentation

The Community-search Problem and How to Plan a Successful Cocktail Party

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Community-search Problem and How to Plan a Successful Cocktail Party Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan, JialiangBao, Ge Wang

  2. Introduction • Graphs are one of the most popular data representation • Have a wide range of applications • Communities and social networks as graphs have gained attention • People represented as nodes • Connection between people are edges • This paper focuses on the query-dependent variant of the community search problem

  3. Planning a Cocktail Party • Participants should be “close” to the organizers (e.g. a friend of a friend). • Everybody should know some of the participants. • The graph should be connected. • The number of participants should not be too small • Not too large either • This is difficult Bob Alice Charlie David

  4. Community Search Problem • Need to find the community that a given set of users belongs to. • Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input.

  5. Related Work • Connectivity Subgraphs • Work has been done to find a subgraph that connects as set of query nodes • Not enough • Need to extract best community that query nodes define • Community Detection • Finding communities in large graphs and social networks • Typical approach looks at optimizing modularity measure • Problem is most methods consider static community detection problem

  6. Related Work • Team Formation • Lappas et. al studied this problem • Given a network where nodes are labeled with a set of skills • Find subgraph in which all skills are present and communication cost is small • A variant of this problem is present for cocktail party planning

  7. Problem definition • Problem 1: • Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (VH, EH) of G, such that: • VH contains Q (all query nodes must be included) • H is connected • f(H) is maximized among all feasible choices of H (the large the better)

  8. Query node and goodness function? • Problem 1: • Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (VH, EH) of G, such that: • VH contains Q (all query nodes must be included) • H is connected • f(H) is maximized among all feasible choices of H (the large the better) What is query node? • They are the nodes that form the community. What is goodness function? • It is to define the dense degree. • Average degree • Minimum degree

  9. Why not choose Average degree function? • Lead to unintuitive result • Easy to add unrelated but dense part

  10. Problem definition • Problem 2: • Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that: • VH contains Q (all query nodes must be included) • H is connected • DQ(H) <= d • f(H) is maximized among all feasible choices of H (the larger the better) We have distance constraint now.

  11. Maximizing the minimum degree • Greedy algorithm: • Steps: • Set G0­ = G, • Delete the minimum degree nodeand all its edges, go to 2 • Termination condition: • Either: • At least one of the query nodes Q has minimum degree • The Query node Q is no longer connected

  12. Time complexity? • Greedy can be implemented in linear time. • Idea: • Make separate lists of nodes with degree d, for d = 1, …, n • When Remove a node u from G, a neighbor of u with degree d will be remove from list d to list d – 1. So total amount of moves is O(m) (m is the edge ) • We can locate the min node in O(1) time, so running time is O(n + m)

  13. Generalization to monotone functions • Minimum degree function is actually a member of this family of functions. • But sometimes we want some other functions to define the node density.

  14. Problem definition • Problem 3: • Given an undirected(connected) graph G(V,E), a set of query nodes Q, a node monotone function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that: • VH contains Q (all query nodes must be included) • H is connected • DQ(H) <= d • f(H) is maximized among all feasible choices of H (the larger the better) We have node monotone function now.

  15. Greedy Gen • Greedy algorithm: • Steps: • Set G0­ = G, • Delete the minimum degree node • Delete the node which f(G,V) is minimum, and all its edges, go to 3 • Termination condition: • Either: • At least one of the query nodes Q has the minimum f(G,v) • The Query node Q is no longer connected

  16. Communities with Size Restriction • Drawback of previous algorithm • They may return subgraphs with very large size.

  17. Complexity • Formal definition of minimum degree with upper bound on the size • An integer k (size constraint) • Subgraph H has at most k nodes • NP-hard

  18. Algorithm • Two heuristics that can be used to find communities with bounded size • Inspired the Greedy algorithm for maximizing the minimum degree • GreedyDist, GreedyFast

  19. Algorithm • GreedyDist • The tighter the distance constraint is, the smaller communities are

  20. Algorithm • GreedyDist • Invoke GreedyGen • If the query nodes are connected but the size constraint is not satisfied, re-execute GreedyGen with a tighter distance constraint • Repeat until the size constraint is satisfied or the query nodes are disconnected

  21. Algorithm • GreedyFast • Preprocess: the input graph is restricted to k’ closest nodes to the query nodes • Execute Greedy on the restricted graph • The closer a node is to the query nodes, the more related the node is to the query nodes, the more likely it is to belong to their community

  22. Experiment Evaluation • DBLP • A coauthorship graph extracted from a recent snapshot of the DBLP database • 226K nodes, 1.4M edges • Tag • A tag graph extracted from the flickr photo-sharing portal • 38K nodes, 1.3M edges • BIOMINE • A graph extracted from the database of the Biomine project • 16K nodes, 491K edges

  23. Quantitative Results • BASELINE: a simple and natural baseline algorithm • |Q|: the number of query nodes • d: distance bound • k: size bound • l: inter-distance between query nodes

  24. Quantitative Results

  25. Conclusion • Aim to find the compact community that contains the given query nodes and it is densely connected • Measurement based on constraints • Minimum degree • Distance • Size • Heuristics • GreedyGen • GreedyDist • GreedyFast

  26. Questions?

More Related