Information management on the world wide web
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Information Management on the World-Wide Web PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

Information Management on the World-Wide Web. Junghoo “John” Cho UCLA Computer Science. The Web and Information Galore. 10 Years Ago. Reading papers for research Stacks of papers Long wait. With Web. Challenges (1). Information overload Too much information, too little time.

Download Presentation

Information Management on the World-Wide Web

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Information management on the world wide web

Information Management on the World-Wide Web

Junghoo “John” Cho

UCLA Computer Science


The web and information galore

The Web and Information Galore


10 years ago

10 Years Ago

  • Reading papers for research

    • Stacks of papers

    • Long wait


With web

With Web


Challenges 1

Challenges (1)

  • Information overload

    • Too much information, too little time


Information overload

Information Overload

  • “XML” to Google

    • 14 Million matching documents!

  • “XML” to Amazon

    • 464 matching books!

  • Which one to read?


Challenges 2

Challenges (2)

  • Hidden Web

    • Not indexed by Search Engines

    • “Hidden” from an average user

    • Browse every site manually?


Challenges 3

Challenges (3)

  • Transience


Challenges 4

Challenges (4)

  • Scattered & unstructured data

    • All Computer Science faculty members and graduate students in the US?


Projects in our group

Projects In Our Group

  • Web Archive

  • Hidden Web Integration

  • Page Ranking Algorithm

  • User Recommendation System


User recommendation system

User Recommendation System

  • 464 books on XML

  • Which one to read?

    • The one that my colleagues and friends recommend?


Amazon s recommendation system

Amazon’s Recommendation System

  • 1 – 5 star rating by individual users

  • Books can be sorted by “average user rating”


My typical scenario

My Typical Scenario

  • Sort books by their average user rating

  • Browse top 20 books to decide what to read


Questions

Questions

  • Is “5 star” by one user better than “4.9 star” by 100 users?

    • Intuitively, I prefer 4.9 star by 100 users

    • More “reliable” rating

  • How much can I trust the rating of a particular person?

    • How do I know that the person’s rating is reliable


Our approach

Our Approach

  • “Inherent quality” or “rating” of a book

    • How many users recommend the book (i.e., give high rating) if all users have read the book?

  • More user rating  More information on the “quality” of the book

    • An average user is likely to give high rating for a high-quality book


Probabilistic rating model

Probabilistic Rating Model

  • How likely is the book of “4 star rating”?

    • Rating probability distribution

Probability density

Book rating/quality


Update of rating probability

Update of Rating Probability

  • As more users provide rating, we update our probability distribution

Probability density

Book rating/quality


Update of rating probability1

Update of Rating Probability

  • As more users provide rating, we update our probability distribution

After five-star rating

by a user

Probability density

Book rating/quality


Update of rating probability2

Update of Rating Probability

  • As more users provide rating, we update our probability distribution

After one-star rating

by a user

Probability density

Book rating/quality


Update of rating probability3

Update of Rating Probability

  • As more users provide rating, we update our probability distribution

After many ratings

Probability density

Book rating/quality


Bayesian inference theory

Probability of book rating BEFORE user rating

Probability of book ratingAFTER user rating

Bayesian Inference Theory

  • Given a user rating UR, what is the inherent rating IR?

P

(

UR

|

IR

)

P

(

IR

)

=

P

(

IR

|

UR

)

P

(

UR

)


User model

User rating

User rating

Book quality

Book quality

Good

Bad

User Model

  • The characteristics of a user

  • Sensitivity: Slope of the curve

    +1: good, –1 : bad, 0: not useful


User model1

User rating

User rating

Book quality

Book quality

Positive bias

Negative bias

User Model

  • The characteristics of a user

  • Bias: Average “height” of the curve


Iterative model refinement

Iterative Model Refinement

  • As more users rate a book, we get better estimates on book quality

  • As we estimate a book quality better, we get better idea on a user’s sensitivity and bias


Iterative model refinement1

User

Characteristics

Iterative Model Refinement

Book Rating

Estimate

User-provided

Rating


Final recommendation

Final Recommendation

  • Recommend the book with the highest expected rating


Initial results

Initial Results

  • Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 user

  • If a user gives random ratings, the system ignores the user’s rating

  • More thorough evaluation on the way


Other projects

Other Projects

  • Web Archive

  • Hidden Web Integration

  • Page Ranking Algorithm


Ph d students on the projects

Ph.D. Students on the Projects

Alex Ntoulas

Rob Adams

Victor Liu

  • In Dr Chu’s group


Thank you

Thank You

  • Questions?


  • Login