Informationssuche in sozialen Netzen - PowerPoint PPT Presentation

ralf schenkel n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Informationssuche in sozialen Netzen PowerPoint Presentation
Download Presentation
Informationssuche in sozialen Netzen

play fullscreen
1 / 56
Informationssuche in sozialen Netzen
69 Views
Download Presentation
nile
Download Presentation

Informationssuche in sozialen Netzen

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Ralf Schenkel Informationssuche in sozialen Netzen Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

  2. Social Tagging Networks Common examples: • Flickr (images) • YouTube (videos) • del.icio.us (bookmarks) • Librarything (books) • Discogs (CDs) • CiteULike (papers) • Facebook • Myspace (media) Definition: Social Tagging Network Website where people • publish + tag information • review + rate information • publish their interests • maintain network of friends • interact with friends Perspektivenvorlesung

  3. Some Statistics Flickr: (as of Nov 2008) • 3+ billion photos, 3 million new photos per day Facebook: (as of Nov 2008) • 10+ billion photos, 30+ million new photos per day • 120 million active users • 150,000 new users per day Myspace: (as of Apr 2007) • 135 million users (6th largest country on Earth) • 2+ billion images (150,000 req/s), millions added daily • 25 million songs • 60TB videos StudiVZ.net: (as of Nov 2008) • 11 million users • 300 million images, 1 million added daily Huge volume of highly dynamic data Perspektivenvorlesung

  4. Showcase: librarything.com Tags Ratings Others Books Perspektivenvorlesung

  5. librarything.com: Social Interaction Similar Users Comments Explicit Friends Perspektivenvorlesung

  6. librarything.com: Tag Clouds Perspektivenvorlesung

  7. librarything.com: Search Search results independent of the querying user(and the social context) Perspektivenvorlesung

  8. librarything.com: Search Search automatically expanded with similar tags(synonyms) Perspektivenvorlesung

  9. Librarything.com: Recommendations Recommendations depend on user and tags(but not on social context) Perspektivenvorlesung

  10. Librarything.com: Recommendations Explanation for the recommendation Perspektivenvorlesung

  11. Librarything.com: Explanations Perspektivenvorlesung

  12. Librarything.com: Explanations Perspektivenvorlesung

  13. Outline • Search in Social Tagging Networks • Graph Model • Different Information Needs • Effective Query Scoring • Efficient Query Evaluation • Summary & Further Challenges Perspektivenvorlesung

  14. Querying Social Tagging Networks travelnorway travelvldb Perspektivenvorlesung

  15. Querying Social Tagging Networks travelnorway travelnorway travelvldb travelvldb travel travelmexico traveltrip travelicde harrypotter harrypotter harrypotter harrypotter probabilitydata miningfoundations Perspektivenvorlesung

  16. Information Need 1: Globally Popular travelnorway travelnorway travelvldb travelvldb travel travelmexico travelicde traveltrip harrypotter or ? harrypotter harrypotter harrypotter probabilitydata miningfoundations harry potter Most frequently tagged items „best“Tags by all users equally important Perspektivenvorlesung

  17. Information Need 2: Similar Users travelnorway travelnorway travelvldb travelvldb travel travelmexico travelicde traveltrip harrypotter harrypotter harrypotter harrypotter or ? probabilitydata miningfoundations travel Perspektivenvorlesung

  18. Information Need 2: Similar Users travelnorway travelnorway travelvldb travelvldb travel travelmexico travelicde traveltrip harrypotter harrypotter harrypotter harrypotter or ? probabilitydata miningfoundations travel Tags by users with similar tags/items(„brothers in spirit“)more important Perspektivenvorlesung

  19. Information Need 3: Trusted Friends probabilityselling probabilityselling probabilityselling travelnorway travelnorway travelvldb travelvldb travel travelmexico traveltrip travelicde harrypotter or ? harrypotter harrypotter harrypotter probabilitydata miningfoundations probability Perspektivenvorlesung

  20. Information Need 3: Trusted Friends probabilityselling probabilityselling probabilityselling travelnorway travelnorway travelvldb travelvldb travel travelmexico traveltrip travelicde harrypotter or ? harrypotter harrypotter harrypotter probabilitydata miningfoundations probability Tags by closely related and well-known users more important Perspektivenvorlesung

  21. Towards Social-Aware Social Search Search results may depend on • Global popularity of items • Spiritual context of the querying user(users with similar books and/or tags) • Social context of the querying user(known and trusted friends) Perspektivenvorlesung

  22. Outline • Search in Social Tagging Networks • Effective Query Scoring • Quantifying Friendship Strengths • User-specific Scoring Functions • Experimental Evaluation • Efficient Query Evaluation • Summary & Further Challenges Perspektivenvorlesung

  23. Notation U set of users T set of tags I set of items tags(u): tags used by user u items(u): items tagged by user u items(t): items tagged with tag t by at least one user df(t): number of items tagged with tag t tfu(i,t): number of times user u tagged item i with tag t tf(i,t): number of times item i was tagged with tag t Perspektivenvorlesung

  24. Quantifying Friendship Strengths • Global „friendship“ strength: • Spiritual friendship strength • Social friendship strength • Integrated friendship strength Perspektivenvorlesung

  25. Spritual Friendship Strength u‘ u overlap in interests of u and u‘ • Several alternatives: • based on overlap of tag usage: harrypotterwizard deathlyhallows philosopherstone u‘ u • based on overlap of tagged items: • overlap of behavior (tagging, searching, rating, …) • For all: • Pspirit(u,u):=0 • normalization such that tags(u): tags used by user u items(u): items tagged by user u Perspektivenvorlesung

  26. Graph-Based Friendship Strength • set Psocial(u,u):=0 • normalization such that distance of u and u‘ in user network u1 u5 u3 u7 u2 u6 Psocial( ,u‘) u4 u2 u‘ u3 u4 u5 u6 u7 Perspektivenvorlesung

  27. Integrated Friendship Strength Query-dependent mixture of • spiritual friendship strength • social friendship strength • background model (global) (0,1; +1) Pint(u,u‘) Perspektivenvorlesung

  28. Excursion: Scoring in Text Retrieval Hand-tuned instance: Okapi BM25 Linear combination for query scores General scoring framework: Importance of t in the collection(the less frequent, the better) Importance of t for item i(the more frequent, the better) Perspektivenvorlesung

  29. Towards a User-specific Score global friendship strength Convert into user-specific social frequency: Compute user-specific social score [SIGIR 2008] Perspektivenvorlesung

  30. Including Tag Expansion Problem: Users use different tags for similar things  poor recall (missing relevant results) Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, MMCI, UdS, Saarland University, … Solution: 1. Define notion of similar tags 2. Expand queries with similar tags 3. Modify scoring function for expanded queries Perspektivenvorlesung

  31. Heuristics for finding similar tags Specialization heuristics: Tag t2specialization of t1 if t1 occurs (almost) whenever t2 occurs Example: t1=Europe, t2=Germany Co-Occurrence heuristics: Tags t1 and t2similar if they occur (almost) always together Perspektivenvorlesung

  32. Scoring Expanded Queries Naive approach: For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query But: „transportation disaster“ expanded by „train car bus plane …“ „international crime“ expanded by „mafia camorra yakuza …“ Result quality drops due to topic drift Better: auto-tuning incremental expansion For query tag t, consider only expansion with highest combined score per item Perspektivenvorlesung

  33. Experimental Evaluation: Effectiveness Systematic evaluation of result quality difficult Three possible setups: • Manual queries + human assessments • Queries+assessments derived from external info (ex: DMOZ categories) • Automated assessments from context of user • Items tagged by friends • Items tagged in the future   ? Perspektivenvorlesung

  34. Prototype [VLDB/SIGIR 2008 demo] Perspektivenvorlesung

  35. Preliminary User Study LibraryThing user study: [Data Engineering Bulletin, June 2008] • 6 librarything users with reasonably large library and friend sets • Overall 49 queries like „mystery magic“, „wizard“, „yakuza“ • Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags, ~12,000 users, ~18,000 friends • Measured NDCG[10]  (spiritual) α(social) • Result quality generally very high • Combination of spiritual and social friends is best Perspektivenvorlesung

  36. Outline • Search in Social Tagging Networks • Effective Query Scoring • Efficient Query Evaluation • Threshold Algorithms • ContextMerge • Experimental Evaluation • Summary & Further Challenges Perspektivenvorlesung

  37. Algorithmic Overview • Input: query q={t1…tn} for user u, α,  • Output: k items with highest scores • Goals: • Avoid computing all results • Minimize disk I/O and CPU load • Utilize precomputed information on disk + „harry potter“ …………………….. Perspektivenvorlesung

  38. Excursion: Threshold Algorithms for Text IR Input: • query q={t1…tn} • lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓ Output: k items with highest aggregated score Family of Threshold Algorithms: • scan lists in parallel • maintain partial candidate results with score bounds • terminate as soon as top-k results are stable Perspektivenvorlesung

  39. Example: Top-1 for 2-term query (NRA) L1 L2 top-1 item min-k: candidates Perspektivenvorlesung

  40. Example: Top-1 for 2-term query (NRA) 0.9 ? A: ?: ? ? score: [0.9;1.9] score: [0.0;1.9] L1 L2 top-1 item min-k: 0.9 candidates Perspektivenvorlesung

  41. Example: Top-1 for 2-term query (NRA) ? 0.9 ? ?: A: D: ? ? 1.0 score: [1.0;1.9] score: [0.0;1.9] score: [0.9;1.9] L1 L2 top-1 item 1.0 min-k: 0.9 candidates Perspektivenvorlesung

  42. Example: Top-1 for 2-term query (NRA) ? ? 0.9 0.3 A: ?: G: D: ? ? 1.0 ? score: [0.3;1.3] score: [0.0;1.3] score: [0.9;1.9] score: [1.0;1.3] L1 L2 top-1 item 1.0 min-k: candidates Perspektivenvorlesung

  43. Example: Top-1 for 2-term query (NRA) 0.3 ? ? 0.9 D: G: A: ?: ? ? ? 1.0 score: [0.9;1.6] score: [1.0;1.3] score: [0.0;1.0] score: [0.3;1.0] L1 L2 top-1 item 1.0 min-k: candidates No more new candidates considered Perspektivenvorlesung

  44. Example: Top-1 for 2-term query (NRA) 0.9 ? ? ? 0.9 0.9 ? 0.9 D: A: A: D: D: A: A: D: ? ? 0.4 1.0 1.0 1.0 1.0 ? score: [1.0;1.25] score: [0.9;1.5] score: [0.9;1.6] score: [1.3;1.3] score: [1.0;1.3] score: [1.0;1.2] score: [0.9;1.55] score: [1.0;1.2] L1 L2 top-1 item 1.0 min-k: 1.3 candidates Algorithm safely terminates Perspektivenvorlesung

  45. Can we reuse this here? No, scores specific to querying user and parameter setting! : harry (=0.2,=0.5) : harry (=0.2,=0.5) : harry (=0.2,=0.5) : harry (=0.2,=0.5) : harry (=0.0,=0.8) : harry (=1.0,=0.0) : harry (=0.0,=1.0) : harry (=0.0,=1.0) : harry (=0.5,=0.5) : harry (=0.0,=0.8) : harry (=1.0,=0.0) : harry (=0.5,=0.5) : harry (=0.0,=0.8) : harry (=0.0,=1.0) : harry (=0.0,=0.8) : harry (=0.5,=0.5) : harry (=1.0,=0.0) : harry (=1.0,=0.0) : harry (=0.0,=1.0) : harry (=0.5,=0.5) 0.98 0.98 0.98 0.98 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.84 0.84 0.84 0.84 0.89 0.89 0.86 0.89 0.89 0.89 0.89 0.89 0.86 0.89 0.86 0.89 0.89 0.89 0.89 0.86 0.45 0.45 0.45 0.45 0.56 0.64 0.56 0.56 0.64 0.56 0.56 0.56 0.64 0.56 0.56 0.64 0.56 0.56 0.56 0.56 harry travel 0.87 0.95 0.82 0.85 0.69 0.51 Number of lists to precompute would explode!(#tags  #users  parameter space) Perspektivenvorlesung

  46. Revisiting the Social Frequency independent of user u dependent of user u Compute sfu(i,t) on the fly from tf(i,t), friends of u and their tagged documents Perspektivenvorlesung

  47. Top-K in Social Networks: ContextMerge Precomputed lists: • ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓ • USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted • FRIENDS(u): pairs <u‘,F(u,u‘)>, sorted by F(u,u‘)↓ ITEMS(harry): alreadyexist insystems 32 26 47 … USERITEMS( , harry): FRIENDS( ): 0.085 0.12 0.10 … Perspektivenvorlesung

  48. ContextMerge Adapted Threshold Algorithm for query u,t: • Scan ITEMS(t) and FRIENDS(u) in parallel • pick „best“ list • If ITEMS(t): read next entry • If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘ • Maintain candidates with bounds for min and max score and current results ITEMS(harry): FRIENDS( ): 47 0.12 0.10 32 0.085 26 … … Perspektivenvorlesung

  49. ContextMerge computemin score bound compute max score bound Adapted Threshold Algorithm for query u,t: • Scan ITEMS(t) and FRIENDS(u) in parallel • pick „best“ list • If ITEMS(t): read next entry • If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘ • Maintain candidates with bounds for min and max score and current results ITEMS(harry): FRIENDS( ): User-indeppart of sf: 47 User-specpart of sf: 47 0.12 ?  |U| 0.10 32 0.085 26 … … Perspektivenvorlesung

  50. ContextMerge User-indeppart of sf: ? User-specpart of sf: 0.12·|U| Adapted Threshold Algorithm for query u,t: • Scan ITEMS(t) and FRIENDS(u) in parallel • pick „best“ list • If ITEMS(t): read next entry • If FRIENDS(u): read USERITEMS(u‘,t) for next friend u‘ • Maintain candidates with bounds for min and max score and current results ITEMS(harry): FRIENDS( ): User-indeppart of sf: 47 User-specpart of sf: 47 0.12  0.88·|U|  |U| ? 0.10 32  47 0.085  |U| 26 … … Perspektivenvorlesung