Information filtering
Download
1 / 56

Information Filtering - PowerPoint PPT Presentation


  • 271 Views
  • Updated On :

Information Filtering. Rong Jin. Outline. Brief introduction to information filtering Collaborative filtering Adaptive filtering. Short vs. Long Term Info. Need. Short-term information need (Ad hoc retrieval) “Temporary need”, e.g., info about used cars

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Information Filtering ' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline l.jpg
Outline

  • Brief introduction to information filtering

  • Collaborative filtering

  • Adaptive filtering


Short vs long term info need l.jpg
Short vs. Long Term Info. Need

  • Short-term information need (Ad hoc retrieval)

    • “Temporary need”, e.g., info about used cars

    • Information source is relatively static

    • User “pulls” information

    • Application example: library search, Web search

  • Long-term information need (Filtering)

    • “Stable need”, e.g., new data mining algorithms

    • Information source is dynamic

    • System “pushes” information to user

    • Applications: news filter


Short vs long term info need4 l.jpg
Short vs. Long Term Info. Need

  • Short-term information need (Ad hoc retrieval)

    • “Temporary need”, e.g., info about used cars

    • Information source is relatively static

    • User “pulls” information

    • Application example: library search, Web search

  • Long-term information need (Filtering)

    • “Stable need”, e.g., new data mining algorithms

    • Information source is dynamic

    • System “pushes” information to user

    • Applications: news filter


Examples of information filtering l.jpg
Examples of Information Filtering

  • News filtering

  • Email filtering

  • Movie/book/product recommenders

  • Literature recommenders

  • And many others …


Information filtering6 l.jpg
Information Filtering

  • Basic filtering question: Will user U like item X?

  • Two different ways of answering it

    • Look at what U likes

       characterize X content-based filtering

    • Look at who likes X

       characterize U collaborative filtering

  • Combine content-based filtering and collaborative filtering  unified filtering (open research topic)


Other names for information filtering l.jpg
Other Names for Information Filtering

  • Content-based filtering is also called

    • “Adaptive Information Filtering” in TREC

    • “Selective Dissemination of Information” (SDI) in Library & Information Science

  • Collaborative filtering is also called

    • Recommender systems



Example collaborative filtering l.jpg

?

Example: Collaborative Filtering


Example collaborative filtering10 l.jpg

5

User 3 is more similar to user 1 than user 2

 5 for movie “15 minutes” for user 3

Example: Collaborative Filtering


Collaborative filtering cf vs content based filtering cbf l.jpg
Collaborative Filtering (CF) vs. Content-based Filtering (CBF)

  • CF do not need content of items while CBF relies the content of items

  • CF is useful when content of items

    • are not available or difficult to acquire

    • are difficult to analyze

  • Problems with CF

    • Privacy issues



Why collaborative filtering13 l.jpg
Why Collaborative Filtering? (CBF)

  • Because it worth a million dollars!


Collaborative filtering l.jpg

Objects: O (CBF)

o1 o2 … ojoj+1… on

3 1 …. … 4 2 ?

2 5 ? 4 3

? 3 ? 1 2

Users: U

u1

u2

um

Collaborative Filtering

  • Goal: Making filtering decisions for an individual user based on the judgments of other users

utest3 4…… 1


Collaborative filtering15 l.jpg

Objects: O (CBF)

o1 o2 … ojoj+1… on

3 1 …. … 4 2 ?

2 5 ? 4 3

? 3 ? 1 2

Users: U

u1

u2

um

Collaborative Filtering

  • Goal: Making filtering decisions for an individual user based on the judgments of other users

utest3 4…… 1

?


Collaborative filtering16 l.jpg
Collaborative Filtering (CBF)

  • Goal: Making filtering decisions for an individual user based on the judgments of other users

  • Memory-based approaches

    • Given a test user u, find similar users {u1, …, ul}

    • Predict u’s rating based on the ratings of u1, …, ul


Example collaborative filtering17 l.jpg

5 (CBF)

User 3 is more similar to user 2 than user 1

 5 for movie “15 minutes” for user 3

Example: Collaborative Filtering


Important issues with cf l.jpg
Important Issues with CF (CBF)

  • How to determine the similarity between different users?

  • How to aggregate ratings from similar users to form the predictions?


Pearson correlation for cf l.jpg
Pearson Correlation for CF (CBF)

  • V1 = (1, 3,4,3), va1 = 2.75

  • V3 = (2,3,5,4), va3 = 3.5

  • Pearson correlation measures the linear correlation between two vectors


Pearson correlation for cf20 l.jpg
Pearson Correlation for CF (CBF)

  • V1 = (1, 3, 4, 3), va1 = 2.75

  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93


Pearson correlation for cf21 l.jpg
Pearson Correlation for CF (CBF)

  • V1 = (1, 3, 4, 3), va1 = 2.75

  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93


Pearson correlation for cf22 l.jpg
Pearson Correlation for CF (CBF)

  • V1 = (1, 3, 4, 3), va1 = 2.75

  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93


Pearson correlation for cf23 l.jpg
Pearson Correlation for CF (CBF)

  • V2 = (4, 5, 2, 5), va2 = 4

  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

-0.55


Pearson correlation for cf24 l.jpg
Pearson Correlation for CF (CBF)

  • V2 = (4, 5, 2, 5), va2 = 4

  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

-0.55


Pearson correlation for cf25 l.jpg
Pearson Correlation for CF (CBF)

  • V2 = (4, 5, 2, 5), va2 = 4

  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

-0.55


Aggregate ratings l.jpg
Aggregate Ratings (CBF)

  • va1 = 2.75, va2 = 4, va3 = 3.5

0.93

-0.55

Estimated Relative Rating

Average Rating

Roundup Rating


Pearson correlation for cf27 l.jpg
Pearson Correlation for CF (CBF)

  • V1 = (1, 3, 3), va1 = 2.33

  • V3 = (2, 3, 4), va3 = 3

0.87


Pearson correlation for cf28 l.jpg
Pearson Correlation for CF (CBF)

  • V1 = (1, 3, 3), va1 = 2.33

  • V3 = (2, 3, 4), va3 = 3

0.87


Pearson correlation for cf29 l.jpg
Pearson Correlation for CF (CBF)

  • V2 = (4, 5, 2), va2 = 3.67

  • V3 = (2, 3, 5), va3 = 3.33

0.87

-0.79


Pearson correlation for cf30 l.jpg
Pearson Correlation for CF (CBF)

  • V2 = (4, 5, 2), va2 = 3.67

  • V3 = (2, 3, 5), va3 = 3.33

0.87

-0.79


Aggregate ratings31 l.jpg
Aggregate Ratings (CBF)

  • va1 = 2.33, va2 = 3.67, va3 = 3.3

0.87

-0.79

Estimated Relative Rating

Average Rating

Roundup Rating


Problems with memory based approaches l.jpg
Problems with Memory-based Approaches (CBF)

  • Most users only rate a few items

    • Two similar users can may not rate the same set of items

       Clustering users and items


Problems with memory based approaches33 l.jpg
Problems with Memory-based Approaches (CBF)

  • Most users only rate a few items

    • Two similar users can may not rate the same set of items

       Clustering users and items


Flexible mixture model fmm l.jpg
Flexible Mixture Model (FMM) (CBF)

Cluster both users and items simultaneously

User clustering and item clustering are correlated !


Evaluation datasets l.jpg
Evaluation: Datasets (CBF)

  • EachMovie: http://www.grouplens.org/node/76 (no longer available)

  • MovieRating: http://www.grouplens.org/node/73

  • Netflix prize: http://archive.ics.uci.edu/ml/datasets/Netflix+Prize


Evaluation metric l.jpg
Evaluation Metric (CBF)

  • Mean Absolute Error (MAE): average absolute deviation of the predicted ratings to the actual ratings on items.

  • The smaller MAE, the better the performance

T: #Predicted Items

Predicted rating

True rating



Adaptive information filtering l.jpg
Adaptive Information (CBF)Filtering

  • Stable & long term interest, dynamic info source

  • System must make a delivery decision immediately as a document “arrives”

my interest:

Filtering

System


A typical aif system l.jpg

Accumulated (CBF)

Docs

Feedback

Learning

A Typical AIF System

User profile

text

Initialization

Accepted Docs

Binary

Classifier

...

User

User

Interest

Profile

Doc Source

utility func


Evaluation l.jpg
Evaluation (CBF)

  • Typically evaluated with a utility function

    • Each delivered doc gets a utility value

    • Good doc gets a positive value

    • Bad doc gets a negative value

    • E.g., Utility = 3* #good - 2 *#bad (linear utility)


Three basic problems in aif l.jpg
Three Basic Problems in AIF (CBF)

  • Making filtering decision (Binary classifier)

    • Doc text, profile text  yes/no

  • Initialization

    • Initialize the filter based on only the profile text or very few examples

  • Learning from

    • Limited relevance judgments (only on “yes” docs)

    • Accumulated documents

  • All trying to maximize the utility


Aif vs retrieval categorization l.jpg
AIF vs. Retrieval, & Categorization (CBF)

  • Adaptive filtering as information retrieval

    • Rank the incoming documents

    • Only returned top k ranked ones to users


Aif vs retrieval categorization43 l.jpg
AIF vs. Retrieval, & Categorization (CBF)

  • Adaptive filtering as information retrieval

    • Rank the incoming documents

    • Only returned top k ranked ones to users

  • Adaptive filtering as categorization problems

    • Classify documents into the categories of interested and not-interested

    • Only returned the ones that are classified as of being interested


Aif vs retrieval categorization44 l.jpg
AIF vs. Retrieval, & Categorization (CBF)

  • Like retrieval over a dynamic stream of docs, but ranking is impossible

  • Like online binary categorization, but with no initial training data and with limited feedback


Major approaches to aif l.jpg
Major Approaches to AIF (CBF)

  • “Extended” retrieval systems

    • “Reuse” retrieval techniques to score documents

    • Use a score threshold for filtering decision

    • Learn to improve scoring with traditional feedback

    • New approaches to threshold setting and learning

  • “Modified” categorization systems (not covered)

    • Adapt to binary, unbalanced categorization

    • New approaches to initialization

    • Train with “censored” training examples


A general vector space approach l.jpg
A General Vector-Space Approach (CBF)

no

doc

vector

Utility

Evaluation

Scoring

Thresholding

yes

profile vector

threshold

Vector

Learning

Threshold

Learning

Feedback

Information


Difficulties in threshold learning l.jpg
Difficulties in Threshold Learning (CBF)

  • Little/none labeled data

  • Correlation between threshold and profile vector

  • Exploration vs. Exploitation (related to utility function)

36.5 R

33.4 N

32.1 R

29.9 ?

27.3 ?

...

=30.0


Threshold setting in extended retrieval systems l.jpg
Threshold Setting (CBF)in Extended Retrieval Systems

  • Utility-independent approaches (generally not working well, not covered in this lecture)

  • Indirect (linear) utility optimization

    • Logistic regression (score->prob. of relevance)

  • Direct utility optimization

    • Empirical utility optimization

    • Expected utility optimization given score distributions

  • All try to learn the optimal threshold


Logistic regression robertson walker 00 l.jpg
Logistic Regression (CBF)(Robertson & Walker. 00)

  • General idea: convert score of D to p(R|D)

  • Fit the model using feedback data

  • Linear utility is optimized with a fixed prob. cutoff

  • But,

    • Possibly incorrect parametric assumptions

    • No positive examples initially

    • limited positive feedback

    • Doesn’t address the issue of exploration


Score distribution approaches aramptzis hameren 01 zhang callan 01 l.jpg
Score Distribution Approaches (CBF)( Aramptzis & Hameren 01; Zhang & Callan 01)

  • Assume generative model of scores p(s|R), p(s|N)

  • Estimate the model with training data

  • Find the threshold by optimizing the expected utility under the estimated model

  • Specific methods differ in the way of defining and estimating the scoring distributions


Gaussian exponential distributions l.jpg
Gaussian-Exponential Distributions (CBF)

  • P(s|R) ~ N(,2) p(s-s0|N) ~ E()

(From Zhang & Callan 2001)


Score distribution approaches cont l.jpg
Score Distribution Approaches (CBF)(cont.)

  • Pros

    • Principled approach

    • Arbitrary utility

    • Empirically effective

  • Cons

    • May be sensitive to the scoring function

    • Exploration not addressed


Direct utility optimization l.jpg
Direct Utility Optimization (CBF)

  • Given

    • A utility function U(CR+ ,CR- ,CN+ ,CN-)

    • Training data D={<si, {R,N}>}

  • Formulate utility as a function of the threshold and training data: U=F(,D)

  • Choose the threshold by optimizing F(,D), i.e.,


Empirical utility optimization l.jpg
Empirical Utility Optimization (CBF)

  • Basic idea

    • Compute the utility on the training data for each candidate threshold (score of a training doc)

    • Choose the threshold that gives the maximum utility

  • Difficulty: Biased training sample!

    • We can only get an upper bound for the true optimal threshold.

  • Solutions:

    • Heuristic adjustment(lowering) of threshold

    • Lead to “beta-gamma threshold learning”


Illustration of beta gamma threshold learning l.jpg

Encourage exploration (CBF)

up to zero

Utility

Cutoff position

,N

0 1 2 3 … K ...

, [0,1]

The more examples,

the less exploration

(closer to optimal)

Illustration of Beta-Gamma Threshold Learning


Beta gamma threshold learning cont l.jpg
Beta-Gamma Threshold Learning (CBF)(cont.)

  • Pros

    • Explicitly addresses exploration-exploitation tradeoff (“Safe” exploration)

    • Arbitrary utility (with appropriate lower bound)

    • Empirically effective

  • Cons

    • Purely heuristic

    • Zero utility lower bound often too conservative


ad