information filtering
Download
Skip this Video
Download Presentation
Information Filtering

Loading in 2 Seconds...

play fullscreen
1 / 56

Information Filtering - PowerPoint PPT Presentation


  • 272 Views
  • Uploaded on

Information Filtering. Rong Jin. Outline. Brief introduction to information filtering Collaborative filtering Adaptive filtering. Short vs. Long Term Info. Need. Short-term information need (Ad hoc retrieval) “Temporary need”, e.g., info about used cars

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Information Filtering' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Brief introduction to information filtering
  • Collaborative filtering
  • Adaptive filtering
short vs long term info need
Short vs. Long Term Info. Need
  • Short-term information need (Ad hoc retrieval)
    • “Temporary need”, e.g., info about used cars
    • Information source is relatively static
    • User “pulls” information
    • Application example: library search, Web search
  • Long-term information need (Filtering)
    • “Stable need”, e.g., new data mining algorithms
    • Information source is dynamic
    • System “pushes” information to user
    • Applications: news filter
short vs long term info need4
Short vs. Long Term Info. Need
  • Short-term information need (Ad hoc retrieval)
    • “Temporary need”, e.g., info about used cars
    • Information source is relatively static
    • User “pulls” information
    • Application example: library search, Web search
  • Long-term information need (Filtering)
    • “Stable need”, e.g., new data mining algorithms
    • Information source is dynamic
    • System “pushes” information to user
    • Applications: news filter
examples of information filtering
Examples of Information Filtering
  • News filtering
  • Email filtering
  • Movie/book/product recommenders
  • Literature recommenders
  • And many others …
information filtering6
Information Filtering
  • Basic filtering question: Will user U like item X?
  • Two different ways of answering it
    • Look at what U likes

 characterize X content-based filtering

    • Look at who likes X

 characterize U collaborative filtering

  • Combine content-based filtering and collaborative filtering  unified filtering (open research topic)
other names for information filtering
Other Names for Information Filtering
  • Content-based filtering is also called
    • “Adaptive Information Filtering” in TREC
    • “Selective Dissemination of Information” (SDI) in Library & Information Science
  • Collaborative filtering is also called
    • Recommender systems
example collaborative filtering10
5

User 3 is more similar to user 1 than user 2

 5 for movie “15 minutes” for user 3

Example: Collaborative Filtering
collaborative filtering cf vs content based filtering cbf
Collaborative Filtering (CF) vs. Content-based Filtering (CBF)
  • CF do not need content of items while CBF relies the content of items
  • CF is useful when content of items
    • are not available or difficult to acquire
    • are difficult to analyze
  • Problems with CF
    • Privacy issues
why collaborative filtering13
Why Collaborative Filtering?
  • Because it worth a million dollars!
collaborative filtering
Objects: O

o1 o2 … ojoj+1… on

3 1 …. … 4 2 ?

2 5 ? 4 3

? 3 ? 1 2

Users: U

u1

u2

um

Collaborative Filtering
  • Goal: Making filtering decisions for an individual user based on the judgments of other users

utest3 4…… 1

collaborative filtering15
Objects: O

o1 o2 … ojoj+1… on

3 1 …. … 4 2 ?

2 5 ? 4 3

? 3 ? 1 2

Users: U

u1

u2

um

Collaborative Filtering
  • Goal: Making filtering decisions for an individual user based on the judgments of other users

utest3 4…… 1

?

collaborative filtering16
Collaborative Filtering
  • Goal: Making filtering decisions for an individual user based on the judgments of other users
  • Memory-based approaches
    • Given a test user u, find similar users {u1, …, ul}
    • Predict u’s rating based on the ratings of u1, …, ul
example collaborative filtering17
5

User 3 is more similar to user 2 than user 1

 5 for movie “15 minutes” for user 3

Example: Collaborative Filtering
important issues with cf
Important Issues with CF
  • How to determine the similarity between different users?
  • How to aggregate ratings from similar users to form the predictions?
pearson correlation for cf
Pearson Correlation for CF
  • V1 = (1, 3,4,3), va1 = 2.75
  • V3 = (2,3,5,4), va3 = 3.5
  • Pearson correlation measures the linear correlation between two vectors
pearson correlation for cf20
Pearson Correlation for CF
  • V1 = (1, 3, 4, 3), va1 = 2.75
  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

pearson correlation for cf21
Pearson Correlation for CF
  • V1 = (1, 3, 4, 3), va1 = 2.75
  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

pearson correlation for cf22
Pearson Correlation for CF
  • V1 = (1, 3, 4, 3), va1 = 2.75
  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

pearson correlation for cf23
Pearson Correlation for CF
  • V2 = (4, 5, 2, 5), va2 = 4
  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

-0.55

pearson correlation for cf24
Pearson Correlation for CF
  • V2 = (4, 5, 2, 5), va2 = 4
  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

-0.55

pearson correlation for cf25
Pearson Correlation for CF
  • V2 = (4, 5, 2, 5), va2 = 4
  • V3 = (2, 3, 5, 4), va3 = 3.5

0.93

-0.55

aggregate ratings
Aggregate Ratings
  • va1 = 2.75, va2 = 4, va3 = 3.5

0.93

-0.55

Estimated Relative Rating

Average Rating

Roundup Rating

pearson correlation for cf27
Pearson Correlation for CF
  • V1 = (1, 3, 3), va1 = 2.33
  • V3 = (2, 3, 4), va3 = 3

0.87

pearson correlation for cf28
Pearson Correlation for CF
  • V1 = (1, 3, 3), va1 = 2.33
  • V3 = (2, 3, 4), va3 = 3

0.87

pearson correlation for cf29
Pearson Correlation for CF
  • V2 = (4, 5, 2), va2 = 3.67
  • V3 = (2, 3, 5), va3 = 3.33

0.87

-0.79

pearson correlation for cf30
Pearson Correlation for CF
  • V2 = (4, 5, 2), va2 = 3.67
  • V3 = (2, 3, 5), va3 = 3.33

0.87

-0.79

aggregate ratings31
Aggregate Ratings
  • va1 = 2.33, va2 = 3.67, va3 = 3.3

0.87

-0.79

Estimated Relative Rating

Average Rating

Roundup Rating

problems with memory based approaches
Problems with Memory-based Approaches
  • Most users only rate a few items
    • Two similar users can may not rate the same set of items

 Clustering users and items

problems with memory based approaches33
Problems with Memory-based Approaches
  • Most users only rate a few items
    • Two similar users can may not rate the same set of items

 Clustering users and items

flexible mixture model fmm
Flexible Mixture Model (FMM)

Cluster both users and items simultaneously

User clustering and item clustering are correlated !

evaluation datasets
Evaluation: Datasets
  • EachMovie: http://www.grouplens.org/node/76 (no longer available)
  • MovieRating: http://www.grouplens.org/node/73
  • Netflix prize: http://archive.ics.uci.edu/ml/datasets/Netflix+Prize
evaluation metric
Evaluation Metric
  • Mean Absolute Error (MAE): average absolute deviation of the predicted ratings to the actual ratings on items.
  • The smaller MAE, the better the performance

T: #Predicted Items

Predicted rating

True rating

adaptive information filtering
Adaptive Information Filtering
  • Stable & long term interest, dynamic info source
  • System must make a delivery decision immediately as a document “arrives”

my interest:

Filtering

System

a typical aif system
Accumulated

Docs

Feedback

Learning

A Typical AIF System

User profile

text

Initialization

Accepted Docs

Binary

Classifier

...

User

User

Interest

Profile

Doc Source

utility func

evaluation
Evaluation
  • Typically evaluated with a utility function
    • Each delivered doc gets a utility value
    • Good doc gets a positive value
    • Bad doc gets a negative value
    • E.g., Utility = 3* #good - 2 *#bad (linear utility)
three basic problems in aif
Three Basic Problems in AIF
  • Making filtering decision (Binary classifier)
    • Doc text, profile text  yes/no
  • Initialization
    • Initialize the filter based on only the profile text or very few examples
  • Learning from
    • Limited relevance judgments (only on “yes” docs)
    • Accumulated documents
  • All trying to maximize the utility
aif vs retrieval categorization
AIF vs. Retrieval, & Categorization
  • Adaptive filtering as information retrieval
    • Rank the incoming documents
    • Only returned top k ranked ones to users
aif vs retrieval categorization43
AIF vs. Retrieval, & Categorization
  • Adaptive filtering as information retrieval
    • Rank the incoming documents
    • Only returned top k ranked ones to users
  • Adaptive filtering as categorization problems
    • Classify documents into the categories of interested and not-interested
    • Only returned the ones that are classified as of being interested
aif vs retrieval categorization44
AIF vs. Retrieval, & Categorization
  • Like retrieval over a dynamic stream of docs, but ranking is impossible
  • Like online binary categorization, but with no initial training data and with limited feedback
major approaches to aif
Major Approaches to AIF
  • “Extended” retrieval systems
    • “Reuse” retrieval techniques to score documents
    • Use a score threshold for filtering decision
    • Learn to improve scoring with traditional feedback
    • New approaches to threshold setting and learning
  • “Modified” categorization systems (not covered)
    • Adapt to binary, unbalanced categorization
    • New approaches to initialization
    • Train with “censored” training examples
a general vector space approach
A General Vector-Space Approach

no

doc

vector

Utility

Evaluation

Scoring

Thresholding

yes

profile vector

threshold

Vector

Learning

Threshold

Learning

Feedback

Information

difficulties in threshold learning
Difficulties in Threshold Learning
  • Little/none labeled data
  • Correlation between threshold and profile vector
  • Exploration vs. Exploitation (related to utility function)

36.5 R

33.4 N

32.1 R

29.9 ?

27.3 ?

...

=30.0

threshold setting in extended retrieval systems
Threshold Setting in Extended Retrieval Systems
  • Utility-independent approaches (generally not working well, not covered in this lecture)
  • Indirect (linear) utility optimization
    • Logistic regression (score->prob. of relevance)
  • Direct utility optimization
    • Empirical utility optimization
    • Expected utility optimization given score distributions
  • All try to learn the optimal threshold
logistic regression robertson walker 00
Logistic Regression (Robertson & Walker. 00)
  • General idea: convert score of D to p(R|D)
  • Fit the model using feedback data
  • Linear utility is optimized with a fixed prob. cutoff
  • But,
    • Possibly incorrect parametric assumptions
    • No positive examples initially
    • limited positive feedback
    • Doesn’t address the issue of exploration
score distribution approaches aramptzis hameren 01 zhang callan 01
Score Distribution Approaches( Aramptzis & Hameren 01; Zhang & Callan 01)
  • Assume generative model of scores p(s|R), p(s|N)
  • Estimate the model with training data
  • Find the threshold by optimizing the expected utility under the estimated model
  • Specific methods differ in the way of defining and estimating the scoring distributions
gaussian exponential distributions
Gaussian-Exponential Distributions
  • P(s|R) ~ N(,2) p(s-s0|N) ~ E()

(From Zhang & Callan 2001)

score distribution approaches cont
Score Distribution Approaches (cont.)
  • Pros
    • Principled approach
    • Arbitrary utility
    • Empirically effective
  • Cons
    • May be sensitive to the scoring function
    • Exploration not addressed
direct utility optimization
Direct Utility Optimization
  • Given
    • A utility function U(CR+ ,CR- ,CN+ ,CN-)
    • Training data D={}
  • Formulate utility as a function of the threshold and training data: U=F(,D)
  • Choose the threshold by optimizing F(,D), i.e.,
empirical utility optimization
Empirical Utility Optimization
  • Basic idea
    • Compute the utility on the training data for each candidate threshold (score of a training doc)
    • Choose the threshold that gives the maximum utility
  • Difficulty: Biased training sample!
    • We can only get an upper bound for the true optimal threshold.
  • Solutions:
    • Heuristic adjustment(lowering) of threshold
    • Lead to “beta-gamma threshold learning”
illustration of beta gamma threshold learning
Encourage exploration

up to zero

Utility

Cutoff position

,N

0 1 2 3 … K ...

, [0,1]

The more examples,

the less exploration

(closer to optimal)

Illustration of Beta-Gamma Threshold Learning
beta gamma threshold learning cont
Beta-Gamma Threshold Learning (cont.)
  • Pros
    • Explicitly addresses exploration-exploitation tradeoff (“Safe” exploration)
    • Arbitrary utility (with appropriate lower bound)
    • Empirically effective
  • Cons
    • Purely heuristic
    • Zero utility lower bound often too conservative
ad