1 / 30

Information Management on the World-Wide Web

Information Management on the World-Wide Web. Junghoo “John” Cho UCLA Computer Science. The Web and Information Galore. 10 Years Ago. Reading papers for research Stacks of papers Long wait. With Web. Challenges (1). Information overload Too much information, too little time.

arva
Download Presentation

Information Management on the World-Wide Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science

  2. The Web and Information Galore

  3. 10 Years Ago • Reading papers for research • Stacks of papers • Long wait

  4. With Web

  5. Challenges (1) • Information overload • Too much information, too little time

  6. Information Overload • “XML” to Google • 14 Million matching documents! • “XML” to Amazon • 464 matching books! • Which one to read?

  7. Challenges (2) • Hidden Web • Not indexed by Search Engines • “Hidden” from an average user • Browse every site manually? …

  8. Challenges (3) • Transience

  9. Challenges (4) • Scattered & unstructured data • All Computer Science faculty members and graduate students in the US?

  10. Projects In Our Group • Web Archive • Hidden Web Integration • Page Ranking Algorithm • User Recommendation System

  11. User Recommendation System • 464 books on XML • Which one to read? • The one that my colleagues and friends recommend?

  12. Amazon’s Recommendation System • 1 – 5 star rating by individual users • Books can be sorted by “average user rating”

  13. My Typical Scenario • Sort books by their average user rating • Browse top 20 books to decide what to read

  14. Questions • Is “5 star” by one user better than “4.9 star” by 100 users? • Intuitively, I prefer 4.9 star by 100 users • More “reliable” rating • How much can I trust the rating of a particular person? • How do I know that the person’s rating is reliable

  15. Our Approach • “Inherent quality” or “rating” of a book • How many users recommend the book (i.e., give high rating) if all users have read the book? • More user rating  More information on the “quality” of the book • An average user is likely to give high rating for a high-quality book

  16. Probabilistic Rating Model • How likely is the book of “4 star rating”? • Rating probability distribution Probability density Book rating/quality

  17. Update of Rating Probability • As more users provide rating, we update our probability distribution Probability density Book rating/quality

  18. Update of Rating Probability • As more users provide rating, we update our probability distribution After five-star rating by a user Probability density Book rating/quality

  19. Update of Rating Probability • As more users provide rating, we update our probability distribution After one-star rating by a user Probability density Book rating/quality

  20. Update of Rating Probability • As more users provide rating, we update our probability distribution After many ratings Probability density Book rating/quality

  21. Probability of book rating BEFORE user rating Probability of book ratingAFTER user rating Bayesian Inference Theory • Given a user rating UR, what is the inherent rating IR? P ( UR | IR ) P ( IR ) = P ( IR | UR ) P ( UR )

  22. User rating User rating Book quality Book quality Good Bad User Model • The characteristics of a user • Sensitivity: Slope of the curve +1: good, –1 : bad, 0: not useful

  23. User rating User rating Book quality Book quality Positive bias Negative bias User Model • The characteristics of a user • Bias: Average “height” of the curve

  24. Iterative Model Refinement • As more users rate a book, we get better estimates on book quality • As we estimate a book quality better, we get better idea on a user’s sensitivity and bias

  25. User Characteristics Iterative Model Refinement Book Rating Estimate User-provided Rating

  26. Final Recommendation • Recommend the book with the highest expected rating

  27. Initial Results • Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 user • If a user gives random ratings, the system ignores the user’s rating • More thorough evaluation on the way

  28. Other Projects • Web Archive • Hidden Web Integration • Page Ranking Algorithm

  29. Ph.D. Students on the Projects Alex Ntoulas Rob Adams Victor Liu • In Dr Chu’s group

  30. Thank You • Questions?

More Related