1 / 36

Measure Proximity on Graphs with Side Information

Measure Proximity on Graphs with Side Information. Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon . 15-19 December, 2009. ICDM 2008, Pisa, Italy. Cyano: Process Collaboration Wiki. Q: How to enable social recommendation?.

panthea
Download Presentation

Measure Proximity on Graphs with Side Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 15-19 December, 2009 ICDM 2008, Pisa, Italy

  2. Cyano: Process Collaboration Wiki Q: How to enable social recommendation? Q: How to enable social recommendation in Cyano?

  3. Scoop: current recommendation system[Qu+ SCC 2008] • Given a node in a graph (e.g., given a user node in a user-to-process graph), • Find • 1. [Ranking List] a list of recommended nodes, which are most related to the query node • 2. [Connection Subgraph] a connection subgraph, which can best interpret the relationship between the query node and the recommended node(s) What to recommend Why to recommend Proximity is the core of scoop!

  4. Challenges in Scoop • How to incorporate users’ feedback (like/dislike)? Q: How to incorporate such side information in measuring node proximity on graphs? 4 5 2 3 1 10 How to modify our subgraph to weaken the links between 1 and 10 that involve node 5? Current subgraph between 1 and 10 How to automatically adjust the ranking for the query node 1? Feedback on ranking list Feedback on conn-graph

  5. Isomorphic Settings of Scoop • Proximity is the Main Tool for • Neighborhood search • Anomaly detection • Pattern matching • Image captioning • … • Source of Side Information is Rich • Ratings in recommendation system • Opinion/sentiment in blog analysis • Clickthrough data • …

  6. Roadmap • Motivations • Proximity w/o Side Information • Proximity w/ Side Information • ProSIN: Method • Fast-ProSIN: Fast Solution • Experimental Results • Conclusion

  7. Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’…

  8. What is a ``good’’ Proximity? … • Multiple Connections • Quality of connection • Direct & In-direct conns • Length, Degree, Weight…

  9. 0.03 0.04 10 9 0.10 12 2 0.08 0.02 0.13 8 1 0.13 11 3 0.04 4 0.05 6 5 0.13 7 0.05 Sol: Random walk with restart [Pan+ KDD 2004] Nearby nodes, higher scores Ranking vector More red, more relevant

  10. Why is RWR a good score? j i : adjacency matrix. c: damping factor all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3

  11. 4 5 2 3 1 10 6 7 8 9 Proximity in Current Scoop Process User P1 4 5 U1 Initial result: P2 P3 P1 P2 U2 2 3 P3 1 10 U3 P4 U4 P5 Ranking List Conn-Subgraph

  12. Roadmap • Motivations • Proximity w/o Side Information • Proximity w/ Side Information • ProSIN: Method • Fast-ProSIN: Fast Solution • Experimental Results • Conclusion

  13. ProSIN: Challenges Query • We want to • Boost the neighbor of 4 • Penalize the neighbor of 6

  14. ProSIN: How to Use Side Information to refine the graph! Query

  15. ProSIN: Detailed Algorithm • Input: • A weighted directed graph A • Source node s and target t • Side information: positive net P and the negative set N • Output: • Proximity score from the source to target • Method: • Add a link from the source node to each of the positive nodes x • Introduce the sink node into the graph • For each of the negative nodes y, • find its neighboring nodes • Add a link from node y to the sink • Add a link from each neighboring node of node y to the sink • Perform random walk with restart for the source node s on the refined graph • Output the proximity score as the steady state probability that the random particle will finally stay at the target node t Skip

  16. Process management Process User Initial result (no feedback): P2 P3 P1 P1 U1 P2 U2 Updated result (`no’ to `P2’) : P3 P4 P5 P3 U3 P4 U4 P5 Given a user-process graph, `U2’ is the query, Which are the top 3 most related processes?

  17. Roadmap • Motivations • Proximity w/o Side Information • Proximity w/ Side Information • ProSIN: Method • Fast-ProSIN: Fast Solution • Experimental Results • Conclusion

  18. 10 9 12 2 8 1 11 3 4 6 5 7 Computing RWR Restart p Starting vector Adjacency matrix Ranking vector 1 n x 1 n x n n x 1

  19. Q: Given query i, how to solve it? Query ? ? Starting vector Ranking vector Ranking vector Adjacency matrix

  20. 10 9 12 2 8 1 11 3 4 6 5 7 OntheFly: ? ?

  21. 10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 OntheFly: No pre-computation / light storage Slow on-line response O(mE)

  22. 10 9 12 2 8 1 11 3 4 6 5 7 NB_Lin[Tong+ ICDM06] • Pre-Compute Stage • Step 1: • Step 2: • On-Line Stage • 2 matrix-vector multiplications V S U X X 1 2 C1 3 4 5 6 C2 7 8 Fast response if … 9 ~ W~ 10 ~ C3 11 The desired graph is un-known 12

  23. How to rescue: Fast-ProSIN Before A lot of Overlap! • - Pre-Compute on original graph • - Update in on-line stage After

  24. Roadmap • Motivations • Proximity wo/ Side Information • Proximity w/ Side Information • ProSIN: Method • Fast-ProSIN: Fast Solution • Experimental Results • Conclusion

  25. Experimental Setup • Data Sets • DBLP-AC • Author-Conference bipartite graph; 400K authors; 3.5K conferences; 1M edges • DBLP-ML • Co-authorship graph from ICML and NIPS; 4.5K nodes, 20K edges • Coral • Image-Region-Keyword graph, 52K nodes, 350K edges • We want to check • The effectiveness of ProSIN • The efficiency of Fast-ProSIN

  26. Interactive Neighborhood Search What are most related conferences wrt KDD? (DBLP author-conference bipartite graph)

  27. Interactive Neighborhood Search What are most related conferences wrt KDD? (DBLP author-conference bipartite graph)

  28. Interactive Neighborhood Search what are most related conferences wrt KDD? (DBLP author-conference bipartite graph)

  29. Connection Subgraph: Initial Result (between “Andrew Mccallum” and “Yiming Yang”) Text Mining Information Retrieval 2 Tom M. Mitchell Seán Slattery 1 Rebecca Hutchinson 1 1 1 1 1 Rayid Ghani Xuerui Wang 1 1 Andrew McCallum Yiming Yang 4 Jian Zhang Statistics 2 2 1 John D. Laffterty Zoubin Ghahramani 2 There are two main connections between “McCallum” and “Yang”

  30. Connection Subgraph: After Feedback (between “Andrew Mccallum” and “Yiming Yang”, but avoid “Tom M. Mitchell”) Andrew Ng Michael I. Jordan 7 Rong Jin 6 1 1 2 3 Andrew McCallum John D. Laffterty Zoubin Ghahramani Yiming Yang 2 2 2 1 2 2 4 2 4 Fernando C.N. Pereira Xiaojin Zhu Jian Zhang The feedback guides to avoid the entire ‘Text’ connection, and brings more connections on ‘Statistics’

  31. Automatic Image Caption Region Image Test Image Keyword Sea Sun Sky Wave Cat Forest Tiger Grass Q: How to assign keywords to the test image?

  32. Semi-automatic image caption (precision) Our method Linear Combination Baseline Remove Negative Nodes Predict Length 5 keywords that are most relevant to the test image are returned for users’ yes/no confirmation

  33. Semi-automatic image caption (recall) Our method Baseline Linear Combination Remove Negative Nodes Predict Length

  34. Fast-ProSIN: Quality-Speed Trade-off Precision Recall Time 93.0%+ quality preserving Up to 49x speed-up

  35. Conclusion • Goal: Incorporate Users’ Feedback (Like/Dis-like) in Proximity Measurement on Graphs • Q: How to customize Tom‘s applications? • A: ProSIN • Basic Idea: Bias Random Walk • Wide Applicability, Easy to Use • Q: How to reflect Tom’s real-time interest? • A: Fast-ProSIN • Basic Idea: Explore smoothness • Significant speedup (minutes to seconds)

  36. Q & A Thank you! htong@cs.cmu.edu hqu@us.ibm.com jamjoom@us.ibm.com

More Related