automatic classification of bookmarked web pages n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Classification of Bookmarked Web Pages PowerPoint Presentation
Download Presentation
Automatic Classification of Bookmarked Web Pages

Loading in 2 Seconds...

play fullscreen
1 / 23
solomon-beach

Automatic Classification of Bookmarked Web Pages - PowerPoint PPT Presentation

81 Views
Download Presentation
Automatic Classification of Bookmarked Web Pages
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Automatic Classification of Bookmarked Web Pages Chris Staff First Talk February 2007 1

  2. Overview • General Principles • Reading List • Tasks involved • Schedule 2

  3. General Principles • Email: cstaff@cs.um.edu.mt • Web site: http://www.cs.um.edu.mt/~cstaff • Plagiarism • Referencing • ACM Digital Library: Membership for students from Malta 3

  4. Reading List • Abrams, D., Baecker, R.: How people use WWW bookmarks. In: CHI ’97: CHI ’97 extended abstracts on Human factors in computing systems, New York, NY, USA, ACM Press (1997) 341-342 • Bugeja, I.: Managing WWW browser’s bookmarks and history (a Firefox extension). Final year project report, Department of Computer Science & AI, University of Malta, 2006. http://hyper.iannet.org/hyperBkreport.pdf • Cockburn, A., McKenzie, B.: What do web users do? an empirical analysis of web use. In: Int. J. Hum.-Comput. Stud. 54(6) (2001) 903-922 • Staff, C.: Automatic Classification of Web Pages into Bookmark Categories. Submitted to UM’07, 2007. • Staff, C.: CSA3200 User Adaptive Systems Lecture Notes, 2006. Follow link from http://www.cs.um.edu.mt/~cstaff/ • Mozilla Development Center: 2006, “Building an Extension”., http://developer.mozilla.org/en/docs/Building_an_Extension 4

  5. Classifying Bookmarks • When a user bookmarks a page (or adds a page to Favorites) we want to recommend the best existing category • Improvement over simply recommending last category saved to • Improvement over simply offering ‘category root’ 5

  6. Tasks • Representation of bookmark categories • Two clustering/similarity algorithms • Extra utility • User interface • Evaluation • Write up report 6

  7. Tasks Overview • We are going to implement a number of algorithms to help with the overall task. • Some of these will be used while the user is browsing • Others will be used to classify pages ‘off-line’ (especially for the existing bookmark files) • We’re going to have a ‘standard test bed’ for conducting the evaluation 7

  8. Tasks Overview • Represent bookmark categories • We’re starting with populated bookmark files, so use ‘How Did I Find That?’ approach • Plus another, individual approach • When a page is to be bookmarked • If referrer page is available, identify topic of page • Otherwise, identify page topic using ‘How Did I Find That?’ approach • Compare current topic topic to bookmark category representations 8

  9. Tasks Overview • User Interface • To replace the built in ‘Bookmark this Page’ menu item and keyboard command • To display a new dialog box to users to offer choice of recommended category, last category used, and to allow user to select some other category or create a new category 9

  10. Tasks Overview • Evaluation • Will be standard and automated • For testing purposes, download test_eval.zip from home page • Contains 2x8 bookmark files (.html) and one URL file (.txt) • Bookmark files are ‘real’ files collected one year ago • URL file contains a number of lines with following format: • Bk file ID, URL of bookmarked page, home category, exact entry from bookmark file (with date created, etc.) 10

  11. Tasks Overview • Evaluation (continued) • Challenge to also ‘re-create’ bookmark file in the order that it was created by users • Eventually, close to the end of the APT, the evaluation test data sets will be made available • About 20 unseen bookmark files and one URL file • Same format as before • You’ll get bookmark files early to prepare representations, but classification run will be part of a demo session 11

  12. Tasks Overview • Write up report • We’ll spend some time looking at the structure of a scientific report, how to write a literature review, present evaluation results, etc. 12

  13. Task: Representing Bookmark Categories • We need to identify what a category or collection of bookmarks is about so that we can check if a new page could belong to that category • Ideally, we find out what is similar between the different documents in the category (especially if we know which link a user followed to reach child!) • In the absence of this information use: • One algorithm will be based on ‘How Did I Find That?’ • A second algorithm that is up to you 13

  14. Task: Two clustering/similarity algorithms • Once we have represented the categories, we can ‘send’ page to be bookmarked to best category • Similar to ‘information filtering’ or ‘clustering’ • What similarity measure or clustering algorithm to use? • One way of representing page to be classified will be based on ‘How Did I Find That?’ • Other way researched/developed by you 14

  15. Task: Extra Utility • How can the classification of web pages to be bookmarked be improved? • What particular interests do you have, and how can they be used to improve classification? • E.g., synonym detection, automatic reorganisation of bookmarks, … 15

  16. Task: User Interface • Can use XUL to ‘extend’ Mozilla Firefox • http://www.xulplanet.com/tutorials/xultu/ • Use Ian Bugeja’s HyperBK as a framework (with due referencing and acknowledgement, of course): https://addons.mozilla.org/firefox/2539/ • Programs are likely to be JavaScript • Your extension will then be portable 16

  17. Task: User Interface • You can use Ian’s interface, but it may need some work to tweak it: • To support some of the new functionality that you’re adding (e.g. choice of algorithms) • And to fix some of the usability problems with the dialog box 17

  18. Task: Evaluation • ACofBWP will be evaluated! • But you must build a version of the program that can be called in batch mode; that will accept a directory containing bookmark files and a URL file; that will run in two modes (classify and reconstruct); and that will report faithfully on its performance. 18

  19. Task: Write Up Report • At least one tutorial will be dedicated to good report writing practice; how to write a literature review; how to build and write references; how to present evaluation results. 19

  20. Grading Structure • 10% for obtaining an average of at least 0.8 precision on evaluation (for random bookmark classification, using either implemented approach) • 10% for incurring a maximum 2 second overhead on average to classify a page (must faithfully report time overhead) • Max. 10% for extra utility. • 40% Report • 15% Presentation • 15% Artifact Design/Implementation 20

  21. Future Opportunities • FYP supervision • Opportunity to co-author research paper that will be submitted to leading IR/AH/UM conference (irrespective of FYP) 21

  22. Pitfalls • Utilities must be lightweight • Mostly those that are interactive, or that are invoked while user is browsing • Should all of a document be used to contribute to a category representation/be used in a similarity measure? 22

  23. Schedule • Until w.c. 6th March inc: Discussion, talks once/week • w.c 19th March: Submit TOC/chapter overview for feedback (optional) • w.c. 23th Apr: Demo 1 (optional) • 23th Apr-7th May: Submit one chapter of your choice for feedback (optional) • w.c. 7th May: Demo 2 (optional) • 14th May: Evaluation collection will be made available • May 25: Submit APT report • June: Demo and evaluation under exam conditions 23