1 / 46

Cross-domain Collaboration Recommendation

Cross-domain Collaboration Recommendation. Jie Tang Tsinghua University. Networked World. 1.3 billion users 700 billion minutes/month. 280 million users 80% of users are 80-90’s. 555 million users .5 billion tweets/day. 560 million users influencing our daily life.

jeff
Download Presentation

Cross-domain Collaboration Recommendation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-domain Collaboration Recommendation Jie Tang Tsinghua University

  2. Networked World • 1.3 billion users • 700 billionminutes/month • 280 million users • 80% of usersare 80-90’s • 555 million users • .5 billion tweets/day • 560 million users • influencing our daily life • 79 million users per month • >10 billionitems/year • 800 million users • ~50% revenue fromnetwork life • 500 million users • 57 billionon 11/11

  3. Cross-domain Collaboration • Interdisciplinary collaborations have generated huge impact, for example, • 51 (>1/3) of the KDD 2012 papers are result of cross-domain collaborations between graph theory, visualization, economics, medical inf.,DB, NLP, IR • Research field evolution Biology bioinformatics Computer Science • [1] Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. Cross-domain Collaboration Recommendation. In KDD'12, pages 1285-1293, 2012.

  4. Collaborative Development • It is impossible to work alone to create almost any piece of a software, in particular for a large software • Collaborative software development model began widespread adoption with the Linux kernel in 1991

  5. Cross-domain Collaboration (cont.) • Increasing trend of cross-domain collaborations Data Mining(DM), Medical Informatics(MI), Theory(TH), Visualization(VIS)

  6. Challenges Sparse Connection: <1% DataMining Theory 1 Large graph ? ? Automata theory heterogeneous network Complementary expertise Graph theory 2 Sociall network Complexity theory Topic skewness: 9% 3

  7. RelatedWork-Collaboration recommendation • Collaborative topic modeling for recommending papers • C. Wang and D.M. Blei. [2011] • On social networks and collaborative recommendation • I. Konstas, V. Stathopoulos, and J. M. Jose. [2009] • CollabSeer: a search engine for collaboration discovery • H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. [2007] • Referral web: Combining social networks and collaborative filtering • H. Kautz, B. Selman, and M. Shah. [1997] • Fab: content-based, collaborative recommendation • M. Balabanovi and Y. Shoham. [1997]

  8. RelatedWork-Expert finding and matching • Topic level expertise search over heterogeneous networks • J. Tang, J. Zhang, R. Jin, Z. Yang, K. Cai, L. Zhang, and Z. Su. [2011] • Formal models for expert finding in enterprise corpora • K. Balog, L. Azzopardi, and M.de Rijke. [2006] • Expertise modeling for matching papers with reviewers • D. Mimno and A. McCallum. [2007] • On optimization of expertise matching with various constraints • W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li. [2012]

  9. Approach Framework—Cross-domain Topic Learning

  10. Author Matching Medical Informatics Data Mining GS GT Cross-domain coauthorships Sparse connection! Author v1 v'1 1 v2 v'2 … … Coauthorships vN v' N' vq Query user

  11. Topic Matching Topics Extraction Data Mining Medical Informatics Topics Topics Complementary Expertise! Topic skewness! GS GT z1 z'1 2 v1 3 v'1 z2 z'2 v2 v'2 z3 z'3 … … … … vN v' N' zT z'T vq Topics correlations

  12. Topic Matching

  13. Cross-domain Topic Learning Identify “cross-domain” Topics Data Mining Medical Informatics Topics GS GT z1 v1 v'1 z2 v2 v'2 … z3 … vN … v' N' zK vq • [1] Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. Cross-domain Collaboration Recommendation. In KDD'12, pages 1285-1293, 2012.

  14. Collaboration Topics Extraction Step 1: Step 2: • [1] Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. Cross-domain Collaboration Recommendation. In KDD'12, pages 1285-1293, 2012.

  15. Intuitive explanation of Step 2 in CTL Collaboration topics

  16. Model Learning • Model learning with Gibbs sampling. We sample z and s and then use the sampled z and s to infer the unknown distributions.

  17. Model Learning (cont.) • We sample s to determine whether a word is generated by a collaboration or by oneself.

  18. Model Learning (cont.) • If s=0, then we sample a pair of collaborators (v, v’) and construct a new topic distribution for the two collaborators, then sample the topic from the new distribution.

  19. Experiments

  20. Data Set and Baselines • Arnetminer(available at http://arnetminer.org/collaboration) • Baselines • Content Similarity(Content) • Collaborative Filtering(CF) • Hybrid • Katz • Author Matching(Author), Topic Matching(Topic)

  21. Performance Analysis Training: collaboration before 2001 Validation: 2001-2005 ContentSimilarity(Content): based on similarity betweenauthors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtainedby the Content and the CF methods. Katz: the best link predictor inlink-prediction problemfor social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combiningthe extracted topics into the random walking algorithm

  22. Performance on New Collaboration Prediction CTL can still maintain about 0.3 in terms of MAP which is significantly higher than baselines.

  23. Parameter Analysis (a) varying the number of topics T (b) varying α parameter (c) varying the restart parameter τ in the random walk (d) Convergence analysis

  24. Prototype System http://arnetminer.org/collaborator Treemap:representing subtopic in the target domain Recommend Collaborators & Their relevant publications

  25. From Peer Collaboration to Team Collaboration

  26. Motivation Task-Collaborator Assignment • Security • Classification Constraints: 1. A Task should be collaborated by k members 2. Work load balance 3. Authoritative Balance/Expertise Balance - at least one senior expert 4. Topic Coverage 5. Conflict-of-Interest(COI) avoidance 6. etc. • Text Mining • Security • SocialNetwork • Graph Mining • Text Mining • SocialNetwork • Graph Mining • Visulization most relevant Challenge: How to find optimal assignment under various constraints? Find experts for each task independently • [1] Wenbin Tang, Jie Tang, Tao Lei, Chenhao Tan, Bo Gao, and Tian Li. On Optimization of Expertise Matching with Various Constraints. Neurocomputing , Volume 76, Issue 1, 15 January 2012, Pages 71-83.

  27. Constraint-based Optimization • Objective • Maximize the relevance between experts and tasks • Satisfy the given constraints • Definitions • V(qj): the set of experts who are able to do task qj • Q(vi): the set of tasks assigned to expert vi • Rij: matching score between qjand vi • Basic Objective

  28. Various Constraints 1. Each task should be assigned to m experts 2. Load Balance 3. Authoritative balance strict: soft:

  29. Various Constraints (con’t) 4. Topic Coverage 5. COI avoidance Employ a binary matrix Optimization Framework: Relevance & COI Authoritative Balance Topic Coverage Load Balance : weight of load balance : weight of authoritative balance : weight of topic coverage • [1] Wenbin Tang, Jie Tang, Tao Lei, Chenhao Tan, Bo Gao, and Tian Li. On Optimization of Expertise Matching with Various Constraints. Neurocomputing , Volume 76, Issue 1, 15 January 2012, Pages 71-83.

  30. Workflow Modeling Multiple Topics Associate each experts and queries with topic distribution • Still problems • How to define topic distributions? • How to calculate the pairwise matching score Rij? • How to optimize the framework? Constraint-based Optimization framework Combine various constraints Generating Pairwise Matching Score Optimization Solving

  31. Optimization Solving Idea: Transform the problem to a convex cost network flow problem. Load Variance Task Node Authoritative Balance sink source Relevance & Topic Coverage Expert Node min-cost max-flow || optimal matching

  32. Online Matching • User feedbacks • Pointing out a mistake match • Specifying a new match

  33. Experimental Setting • Paper-reviewer data set • 338 papers (KDD’08, KDD’09, ICDM’09) • 354 reviewers (PC members of KDD’09) • COI matrix : coauthor relationship in the last five yrs. • Course-teacher data set • 609 graduate courses (CMU, UIUC, Stanford, MIT) • Intuition: teachers’ graduate course often match his/her research interest.

  34. Experiment Setting(con’t) • Evaluation measures • Matching Score (MS): • Load Variance (LV): • Expertise Variance (EV) • Precision (in course-teacher assignment expr.) • Baseline: Greedy Algorithm

  35. Paper-reviewer Experiment : weight of load balance : weight of authoritative balance

  36. Paper-reviewer Experiment : weight of load balance : weight of authoritative balance

  37. Paper-reviewer Case Study

  38. Course-Teacher Experiment

  39. Online System • http://review.arnetminer.org

  40. Conclusion • Study the problem of cross-domain collaboration recommendation and team collaboration • Propose the cross-domain topic model for recommending collaborators • Transformed the team collaboration problem as a optimization problem with convex-cost network flow problem • Experimental results in a coauthor network demonstrate the effectiveness and efficiency of the proposed approach

  41. Future work • Connect cross-domain collaborative relationships with social theories (e.g. social balance, social status, structural hole) • Apply the proposed method to other networks

  42. Thanks!Collaborators: SenWu (Stanford) Jimeng Sun, Hang Su (Gatech)Wenbin Tang (Face++), Chenhao Tan (Cornell)Tao Lei (MIT), Bo Gao (THU) System: http://arnetminer.org/collaborator Code&Data: http://arnetminer.org/collaboration

  43. Challenge always be side with opportunity! • Sparse connection: • cross-domain collaborations are rare; • Complementary expertise: • cross-domain collaborators often have different expertise and interest; • Topic skewness: • cross-domain collaboration topics are focused on a subset of topics. How to handling these patterns?

  44. Performance Analysis ContentSimilarity(Content): based on similarity betweenauthors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtainedby the Content and the CF methods. Katz: the best link predictor inlink-prediction problemfor social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combiningthe extracted topics into the random walking algorithm

  45. Performance Analysis ContentSimilarity(Content): based on similarity betweenauthors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtainedby the Content and the CF methods. Katz: the best link predictor inlink-prediction problemfor social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combiningthe extracted topics into the random walking algorithm

  46. Performance Analysis ContentSimilarity(Content): based on similarity betweenauthors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtainedby the Content and the CF methods. Katz: the best link predictor inlink-prediction problemfor social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combiningthe extracted topics into the random walking algorithm

More Related