1 / 52

How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web

Sihem Amer Yahia Yahoo! Research. How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web. Outline. Web search and web 2.0 search Why should we all get along? How could we all get along? Related work Conclusion. Web search.

ayla
Download Presentation

How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sihem Amer Yahia Yahoo! Research How Could We All Get Along on the Web 2.0? The Power of Structured Data on the Web

  2. Outline • Web search and web 2.0 search • Why should we all get along? • How could we all get along? • Related work • Conclusion

  3. Web search • Access to “heterogeneous”, distributed information • Heterogeneous in creation • Heterogeneous in motives • Keyword search very effective in connecting people to information Search web pages web pages

  4. Content consumers Web search vs web 2.0 search? Web 2.0 search Subscribers Feeds Web search Anonymous Content aggregators Content creators

  5. Web 2.0 a generation of internet-based services that • let people form online communities • in order to collaborate • and share information in previously unavailable ways

  6. Online communities • Subscribers join communities where they • exchange content: emails, comments, tags • rate content from other subscribers • exhibit common behavior • About 500M unique Y! visitors per month, about 200M subscribers (login visitors) to more than 130 Y! services

  7. Web 2.0 search Connecting people to people Web 2.0 Flickr Y!Answers YouTube Y!Groups

  8. Web 2.0 search examples • Mary is a professional photographer and is looking for aerial photos of the Hoggar desert • She is also an amateur Jazz dancer and wants to ask about dance schools w/flexible schedules in SF • She is also looking for the latest video on bird migration in Central Park, NY • She has heart problems but loves biking and is interested in finding about email discussions on biking trails in northern California

  9. Outline • Web search and web 2.0 search • Why should we all get along? • How could we all get along? • Related work • Conclusion

  10. Improving users’ experience • Keyword search should be maintained: simple and intuitive • Keyword queries usually short • only express a small fraction of the user's true intent • Users's interactions within community-based systems can be used to infer a lot more about intent and return better answers

  11. Why should we all get along? • Contributed content is structured • This is what DB community knows how to do best • Relevance to query keywords is key • This is what IR community knows how to do best

  12. Searching online communities data table Tags, ratings, Reviews table community relationship table

  13. Searching online communities • Search for most relevant data on some topic • Querying data: selection over data table • Querying annotations: selection over annotation table + join w/data table • Personalizing answers: join w/subscribers table • Relevance: use data relevance + annotation table

  14. Why should we all get along? • Query interpretation depends on subscriber’s interest at the time of querying • Data annotations are dynamic • Precompute all (sub,sub,trust) for each topic? • Need for dynamic query generation

  15. DB and IR • Shared interactions help focus search • User-input, community-input, extraction • Personalizing answers with community information • Ranking as a combination of • Relevance • Relationship strengths between people in the same community

  16. Outline • Web search and web 2.0 search • Why should we all get along? • How could we all get along? • Applications • Technical challenges • Related work • Conclusion

  17. Applications • Flickr enables sharing and tagging photos • Y! Answersenables asking and answering questions in natural language • YouTube enables sharing videos, rating videos, commenting on videos and subscribing to new videos from favorite users • Y! Groups enables creating groups, joining existing groups, posting in a group

  18. Flickr • Acquired by Y! in 2005 • Tag search • Photos grouped into categories. • Set privacy levels on each photo

  19. The new inputs to Flickr search Users tag and rate photos • Combine tag-based search • with community knowledge • Combine photo rating with • relationship strength Users tagging same photos with similar tags form a community of interest

  20. Y! Answers • Launched in second half of 2005 • Incentive system based on points and voting for best answers • Questions grouped by category • Some statistics: • over 60 million users • over 120 million answers, available in 18 countries and in 6 languages

  21. Y! Answers

  22. Y! Answers

  23. The new inputs to Y!Answers search Users provide Questions/Answers Combine community information with answer rating Voting information reflects communities of interest

  24. YouTube • Founded in February 2005 • Tag search • Videos grouped by category • Some statistics: • 100 million views/day • 65,000 new videos/day

  25. The new inputs to YouTube search Users provide videos, tags, ratings, comments Combine community information with video rating Similar tags on same videos imply communities of interest

  26. Yahoo! Groups • Yahoo! acquired eGroups in 2000 • Group moderators • Groups belong to categories • Public and private groups • Some statistics: • over 7M groups • over 190M subscribers • over 100K new subscribers/day • over 12M emails/day

  27. Alternative query interpretations • Return all group postings relevant to a query. • Return only posting by subscribers sharing the same interests: women with heart disease interested in steep slopes

  28. The new inputs to Group search Users participate in many groups Combine community information with postings relevance Group membership and postings imply communities of interest

  29. Outline • Web search and web 2.0 search • Why should we all get along? • How could we all get along? • Applications • Technical challenges • Related work • Conclusion

  30. So, how can we all get along? • Augment keyword query with conditions on structure to focus and personalize search (DB) • Flickr: tags • Answers: points • YouTube: reviews and ratings • Groups: emails • Combine it with relevance (IR)

  31. search terms structuredquery Search architecture Query tightening Query evaluation Subscriber Ranking content relevance + relationship Find relevant community of interest

  32. “biking trails northern california” message contains “…” and from = “s1” or “s2” Example S1 S1 S2 S2 S3 S3 S4 S4 S5 S5 S6 S6 S7 S7 From: To: Date: Subject: Content: ( si, sj, cij ) message structure Many such relationships depending on subscriber’s interests Query tightening

  33. Can we really all get along? • IR may think that user weights are enough to target communities of interest and personalize queries • DB thinks expressiveness of query languages cannot all be captured by ranking functions

  34. Query rewriting Content-Only Content in context Loose interpretation of context

  35. Query relaxation • Primitive operations for dropping query predicates • Answers to relaxed query contain answers to exact one • Scores relaxed answer no higher than score of exact one

  36. Query tightening • Primitive operations for adding query predicates • Tighter answers are found but looser answers should be maintained • Scores tighter answers no lower than scores of other answers

  37. More technical challenges • Query tightening primitives to focus search • Subscriber has a different profile/community of interest • Topk processing needs to enforce user profiles

  38. Outline • Web search and web 2.0 search • Why should we all get along? • How could we all get along? • Applications • Technical challenges • Related work • Conclusion

  39. Related Work • Language models: Ask Bruce Croft • Web search personalization • Search behavior • HARD track at TREC • Building relationship graphs: • Collaborative filtering • Clustering • Unsupervised learning

  40. Tempting conclusion • Little information could be gathered on users to greatly improve new-generation search • IR and DB views both needed

  41. More technical challenges • Subscriber belongs to different communities of interest • Should subscriber turn off personalization? • How is efficiency affected? (revisiting topk processing) • Back from community search to web search?

  42. Beyond search in online communities • Are online communities a way to build more accurate user profiles or more? • display relevant groups when user is asking a question on Y! Answers: mashups?

More Related