1 / 23

„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research

Ralf Schenkel. „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research. Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum. Social Tagging Networks. Common examples:

shel
Download Presentation

„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ralf Schenkel „IP“ is not always „Internet Protocol“A long and a very short example for IP problems in Web 2.0 research Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

  2. Social Tagging Networks Common examples: • Flickr (images) • YouTube (videos) • del.icio.us (bookmarks) • Librarything (books) • Discogs (CDs) • CiteULike (papers) • Facebook • Myspace (media) Definition: Social Tagging Network Website where people • publish + tag information • review + rate information • publish their interests • maintain network of friends • interact with friends Dagstuhl Perspectives Workshop Web 2.0

  3. (long) Part 1: Search in Social Tagging Networks

  4. Some Statistics Flickr: (as of Nov 2007) • 2+ billion photos Facebook: (as of Apr 2007) • 1.8 billion photos • 31 million active users • 100,000 new users per day Myspace: (as of Apr 2007) • 135 million users (6th largest country on Earth) • 2+ billion images (150,000 req/s), millions added daily • 25 million songs • 60TB videos Huge volume of highly dynamic data Dagstuhl Perspectives Workshop Web 2.0

  5. Showcase: librarything.com Tags Ratings Others Books Dagstuhl Perspectives Workshop Web 2.0

  6. librarything.com: Social Interaction Similar Users Comments Explicit Friends Dagstuhl Perspectives Workshop Web 2.0

  7. librarything.com: Tag Clouds Dagstuhl Perspectives Workshop Web 2.0

  8. librarything.com: Search Search results independent of the querying user(and the social context) Dagstuhl Perspectives Workshop Web 2.0

  9. Outline • Introduction • Modelling Social Tagging Networks • Graph Model • Different Information Needs • Effective Query Scoring • Efficient Query Evaluation • Summary & Further Challenges Dagstuhl Perspectives Workshop Web 2.0

  10. Social Network Model travelChina queueingtheory travelNorway USERS TAGS ITEMS Dagstuhl Perspectives Workshop Web 2.0

  11. Social Network Model travelChina queueingtheory travelNorway USERS TAGS ITEMS Dagstuhl Perspectives Workshop Web 2.0

  12. Social Network Model travel queues travel probability travel probability travel tripvldb travelChina queueingtheory travelNorway USERS TAGS harrypotter ITEMS Dagstuhl Perspectives Workshop Web 2.0

  13. Information Need 1: Global travel queues travel probability travel probability travel tripvldb travelChina queueingtheory travelNorway USERS harry potter TAGS harrypotter ITEMS Tags by all users equally important Dagstuhl Perspectives Workshop Web 2.0

  14. Information Need 2: Similar Users travel queues travel probability travel probability travel tripvldb travelChina queueingtheory ? travelNorway USERS travel TAGS harrypotter Tags by users with similar tags/itemsmore important ITEMS Dagstuhl Perspectives Workshop Web 2.0

  15. Information Need 3: Trusted Friends travel queues travel probability travel probability travel tripvldb travelChina queueingtheory ? travelNorway USERS probability TAGS harrypotter ITEMS Tags by closely related usersmore important Dagstuhl Perspectives Workshop Web 2.0

  16. Wishlist for Social-Aware Social Search • Search results depend on • Global popularity of items • Collection context of the querying user (books, tags) • Social context of the querying user (trusted friends) • Automatic tag expansion (beyond synonyms) • Scalable query processing • Explanation of results (similar wishlist for social recommendations) Dagstuhl Perspectives Workshop Web 2.0

  17. Fast Forward… Imagine a 20 minutes talk about quantified friendship measures, personalized scoring models, dynamic tag expansion, scalable query processing, … • Essence: • Context-aware personalized search • Tags from closely related users are more important • Different kinds of „relatedness“ possible [SIGIR 2008] Dagstuhl Perspectives Workshop Web 2.0

  18. Experimental Evaluation: Effectiveness Systematic evaluation of result quality difficult Three possible setups: • Manual queries + human assessments • Queries+assessments derived from external info (ex: DMOZ categories) • Automated assessments from context of user • Items tagged by friends • Items tagged in the future   ? Dagstuhl Perspectives Workshop Web 2.0

  19. Prototype Implementation Not on the Web! [SIGIR Demo 2008], [VLDB Demo 2008] Dagstuhl Perspectives Workshop Web 2.0

  20. Preliminary User Study LibraryThing user study: [Data Engineering Bulletin, June 2008] • 6 librarything users with reasonably large library and friend sets • Overall 49 queries • Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags, ~12,000 users, ~18,000 friends • Measured NDCG[10] Authors of the paper (1-α) (content) (1-α) (graph) Dagstuhl Perspectives Workshop Web 2.0

  21. We need a benchmark collection, but… • Everybody „has“ data from Flickr, librarything • Data contains private information by definition • Data cannot be successfully anonymized (AOL) • Data must not be anonymized(we need the users to assess results) • Data must be large scale(a few volunteers are not enough) • Collection must be completely offline availablefor stability of results (including images,…) Dagstuhl Perspectives Workshop Web 2.0

  22. (very short) Part 2: Web Archiving

  23. Online Information is Volatile • Huge amount of information available online only today • Easily lost (hardware failure, software failure, human failure, deletion, attack, …) • Easily unaccessible (anybody knows Interleaf?) • Easily manipulated • How will historians learn about the 21th century? Strong need for long-term preservation of the evolving Web Dagstuhl Perspectives Workshop Web 2.0

More Related