1 / 46

Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails

Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails. Soumen Chakrabarti Sandeep Srivastava Mallela Subramanyam Mitul Tiwari Indian Institute of Technology Bombay. Sources of Web information. Sources already exploited Text on pages (keyword search)

Download Presentation

Memex: A Browsing Assistant for Collaborative Archiving and Mining of Surf Trails

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memex: A Browsing Assistant forCollaborative Archiving andMining of Surf Trails Soumen ChakrabartiSandeep SrivastavaMallela SubramanyamMitul Tiwari Indian Institute of Technology Bombay

  2. Sources of Web information • Sources already exploited • Text on pages (keyword search) • Link between pages (popularity rating) • Topic taxonomies (query expansion) • Sources not exploited enough yet • Public surfing history • Public bookmarks • Collaboration is central to hypertext • Lack of trust limits collaboration on Web

  3. Our goals • Infrastructure to support spontaneous formation of topic-based collaborative Web communities • Browsing assistant client • Community server • Mining algorithms for personal and community level topic management and collaborative resource discovery • Extensible API for plugging in additional hypertext analysis tools

  4. 2: Install the Memex applet signing certificate and visit the applet page 4: Log on to the Memex server 3: Allow the Memex client to attach to your Web browser 1: Create a Memex account (password sent by email)

  5. Function tabs Memex client applet attaches to browser Privacy choice

  6. Preparing to import initial bookmarks

  7. Bookmarks imported

  8. For Memex to suggest an initial topic organization, select all bookmarks…

  9. …and send them to the clustering tab

  10. Switch to the clustering tab URLs to be clustered appear here

  11. Submit the URLs to the server-side Memex clustering demon

  12. Check later if the server has completed the clustering task

  13. Two top-level clusters about software and music

  14. Expanding the software cluster to study it in more detail

  15. User can freely reorganize URL placement using cut-and-paste

  16. User can freely reorganize URL placement using cut-and-paste

  17. User can freely reorganize URL placement using cut-and-paste

  18. Moving an entire folder from the cluster tab…

  19. …to the folder tab together with example URLs

  20. …to the folder tab together with example URLs

  21. Folder names can be edited as per taste; this also gives Memex additional clues about the folder’s contents

  22. New folders can be created to hold clusters found in the cluster tab

  23. New folders can be created to hold clusters found in the cluster tab

  24. A topic hierarchy which is too detailed for the user can be flattened

  25. A topic hierarchy which is too detailed for the user can be flattened

  26. Groups of closely related URLs can be moved back to folders in the folder tab

  27. Groups of closely related URLs can be moved back to folders in the folder tab

  28. Memex helps the user derive a starting topic hierarchy from unstructured bookmarks

  29. The user then continues browsing in multiple sessions. Relevant pages found by other members of the community and made public are available for collaborative surfing

  30. If permission is granted, the Memex applet monitors the trail that the surfer follows and uploads it to the server for further analysis and mining

  31. If permission is granted, the Memex applet monitors the trail that the surfer follows and uploads it to the server for further analysis and mining

  32. Such surf trails together with page contents are valuable inputs to the Memex server-side hypertext mining and resource discovery demons

  33. ‘?’ indicates that Memex is not sure about the folder assignment. Users can easily correct mistakes and this forms additional valuable training data. In the background, the Memex classifier finds the most suitable folders to assign to each history items. History is never deleted (disk is cheap). When the user refreshes the view, surf history from others and herself are found categorized into the user’s familiar topic tree.

  34. Automatic collaborative classification also lets users return to a topic-restricted surfing context quickly, and replay the last few surfing actions within that topic of interest.

  35. Personalized topic-based history management is far superior to the one- dimensional history list provided by popular browsers

  36. Users can switch topics with a single click, and browsing is not limited by the linear “back and forward” paradigm supported by browsers.

  37. Users can switch topics with a single click, and browsing is not limited by the linear “back and forward” paradigm supported by browsers.

  38. A flexible interactive search lets the user locate any page ever visited from anywhere using this account, combining content with popularity, site selections and timeliness

  39. A flexible interactive search lets the user locate any page ever visited from anywhere using this account, combining content with popularity, site selections and timeliness

  40. Close integration of the Memex client with the browser is non-trivial to implement but adds greatly to comfort and ease of use

  41. Memex system diagram Browser Memex server Visit Client JAR Taxonomy synthesis Resource discovery Search Attach Recommendation Folder Download Context Classification Mining demons Running client applet Event-handler servlets Archive Clustering Relational metadata Text index Topic models Memex client-server protocol and workload sharing negotiations

  42. X Document workflow Page visit and bookmarking events logged NODE table Browser Memex client Push new version Per-document version queue Crawler Pop and discard old version Demon Registry Search indexer Classifier service Clustering service Garbage collector

  43. Autonomous topic organization • Bookmarks often collected into topics • Surfers use personal topic organization • One-size-fits all taxonomy inadequate • Many topics over-developed for most of us • http://dmoz.org/Sports/Hockey/Underwater_Hockey/ • But deeper interests often underdeveloped • Structure reorganization also desirable • Best taxonomy depends on community behavior as well as page content

  44. Autonomy and collaboration • Personalization  picking Yahoo nodes • Complex relations between topics • Need “simplest common ground” • Coalesce similar topics where possible… • …without sacrificing individual taste User1 User2 User3 Yahoo Cycling Sports Biz Sports Sports Shops Hiking Cycling Bikeshops Bikeshops Subsumption Tree ‘inversion’

  45. Themes ‘Radio’ ‘Television’ ‘Movies’ Share document Share folder Share terms Taxonomy synthesis example • Generating themes makes map simpler • But distorts contents of original folders • Joint optimization gives best themes Media kpfa.org bbc.co.uk kron.com Broadcasting channel4.com kcbs.com Entertainment foxmovies.com miramax.com Studios lucasfilms.com

  46. Summary and project status • Collaborative resource discovery and topic management system • Testbed for hypertext mining research • Signed Java2 client • Netscape 4.5+ available • IE5+ planned • Server for Unix and Windows • IBM UDB, Berkeley DB, servlets • Non-trivial to install and manage • Simple-to-use RPMs being planned • http://www.cse.iitb.ernet.in/~soumen

More Related