820 likes | 1.22k Views
Modeling Preference: Integrating Content-based and Collaborative Preference on Documents. Sung-Young Jung Intelligent Systems Program chopin@cs.pitt.edu http://www.isp.pitt.edu/~chopin/ Presentation at AI forum April 7, 2006. Contents. Introduction Concept of Preference
E N D
Modeling Preference:Integrating Content-based and Collaborative Preference on Documents Sung-Young Jung Intelligent Systems Program chopin@cs.pitt.edu http://www.isp.pitt.edu/~chopin/ Presentation at AI forum April 7, 2006
Contents • Introduction • Concept of Preference • A Statistical Approach • Coping with Data Sparseness • Combining Joint Features • Data and Experimental Environments • Integrating Content-based and Collaborative Preference • User similarity
Generals in Preference Modeling • General • There are many users with different behaviors. • A user behavior is dependent on the user preference. • The goal is to predict the future behavior based on the user preference. • Applications • Recommendation systems, Personalization systems, Prediction systems. Item: Document Product TV program
An Item and Features • We want to estimate preference of the item Pref(x) given the feature set Item : x Document Product TV program • Words in title & Content • Category • Name • Color • Words in Description • Price feature : w • Words in title • Words in synopsis • Genre • Channel • Time
Problem Description • We want to estimate • Preference of an item: Pref(x) • Preference of a feature: Pref(wi) • Data are collected from natural human behaviors • Navigation, browsing, purchasing, etc. • Data can contain noise • Only positive examples are given • Negative examples are hidden • However, we want to find items with negative preference. • Common machine learning algorithms do not work well. The all items : Item x: Documents Product TV program The selected item set V :
Previous Researches on Preference • Recommendation Algorithms • Collaborative filtering, social filtering. • recommendation based on information of similar users. • Content-based filtering. • Similar idea to document similarity. • Using content information in an item. • Information Retrieval. • Document similarity. • Vector similarity. • Association-rule mining. • Mining associated item set • In the previous studies, a model was built without a formalized concept of “Preference”. • Utility Theory • Preference is given by a human designer, not estimated from data.
Problems of Vector Similarity • Vector similarity is generally used in collaborative filtering, and information retrieval areas • Cosine, inner product, Pearson Correlations • The resulting preference value does not provides a good interpretation about how much the user like or dislike • Preference 0.5 -> how much does he like?? • Inability of providing interpretation from the preference value causes serious problems. Pref(w1)=0.4 Pref(w2)=0.5 Pref(w1,w3)=0.6 Pref(w2,w5)=0.2 Which one is better?? vs.
Problems of Probability Representing Preference • Bayesian Network is often used for preference modeling. • Probability is not preference. • Probability represents how often a user selects a given item. • Preference is one of the factors affecting selection probability.
Problems of Probability: Examples • Someone living in the US selected Hollywood movies often. > • It does not mean that he has high preference on Hollywood movies. • It will be disappointing if you recommend a movie to him only because it is a Hollywood movie. • Someone living in the US selected a Korean movie with low freq. > • He may have high preference on Korean movies even though the probability choosing them is low. HOLLYWOOD HOLLYWOOD HOLLYWOOD HOLLYWOOD HOLLYWOOD HOLLYWOOD
The Relation of Probability and Preference • (a): When selection probability is low even though the availability is high: negative preference When selection probability is high even though the availability is low: positive preference • (b) Preference is positive if and only if selection probability is higher than availability. • Because of this property, mutual Information has been proposed as a preference measure.
An Issues of Negative Preference • How to estimate negative preference only from positive examples? > • The previous approaches don’t have a good solution for this problem. > • Disliked items can be selected by a user • Data can contain noise. • Estimating negative preference provides an advantages in recommendation. The Items the user dislikes The all items : Disliked Items can be included in user’s selection The selected item set V :
An Issue of Negative Preference - Feature • How to estimate negative preference only from positive examples? > • The previous approaches don’t have a good solution for this problem. > • Disliked items can be selected by a user • Data can contain noise. • Estimating negative preference provides an advantages in recommendation. > • Features with negative preference often included in selected items. The Items the user dislikes The all items : Disliked Items can be included in user’s selection sometimes The selected item set V : All the drawings will be unframed, and will sell for relatively low prices – a couple of bucks is standard. So for less than $20, a fan of art could virtually wallpaper a room with locally drawn – and locally modeled – art.Lewis attempts some quick math in his head, then says, “It’s been an awful long time.” • A user likes “art”, but dislikes “math”. • However, disliked words can be included in a document that a user selected.
Contents • Introduction • Concept of Preference • A Statistical Approach • Coping with Data Sparseness • Combining Joint Features • Data and Experimental Environments • Integrating Content-based and Collaborative Preference • User similarity
The Basic Idea – Neutral Preference w : The item with feature w • X(w) : the set of item with the feature w The all items : w Availability: P(X(w))=4/10 =0.4 The selected item set V : :Equal Neutral Preference Selection Probability: P(X(w)|V)=2/5 =0.4 W W W
The Basic Idea – Positive Preference w : The item with feature w • X(w) : the set of item with the feature w The all items : w Availability: P(X(w))=4/10 =0.4 The selected item set V : Positive Preference Selection Probability: P(X(w)|V)=3/6 =0.5 W W W
The Basic Idea – Negative Preference w : The item with feature w • X(w) : the set of item with the feature w The all items : w Availability: P(X(w))=4/10 =0.4 The selected item set V : Negative Preference Selection Probability: P(X(w)|V)=1/4 =0.25 W W W
Preference of a Single Feature • Mutual information has been proposed for preference of a single feature. • X(w) is the set of all items which contains the feature w • V is the item set which a user selected. • I(X(w);V) is mutual information. > • It satisfies the property of preference related to probability and availability. • It gives intuitive interpretation from the preference value. (3) Selection Probability Availability
U X(w) V w Mutual Information of Single Feature • Mutual information of a feature represents the similarity between the item set with a feature X(w), and the set of selected items V. • w : feature • X(w) : a set of items which contain feature w. • V : a set of items which is selected by a user as “like” • U : Whole item set
U U U X(w) V V X(w) X(w) V w w w Intuitive Interpretation from the Preference Value (1) • The preference value represents how a given set and the user preferred set are correlated (a) dissimilar (b) neutral (c) similar : Negative(-) < Zero(0) < Positive(+) Negatively correlated Independent Positively correlated Dislike Indifferent Like
Intuitive Interpretation from the Preference Value (3) • You can get clear meaning from the resulting preference value! (2+1) (2-1)
Item Preference: Feature-based • Preference for a given item x is defined by the normalized summation of the preferences of all features. • on the assumption that all the features are independent. • M(x) is normalization term which is defined by the number of features appeared in the item x • Preference of a given item x can be interpreted as an average preference value of the features. (2)
Contents • Introduction • Concept of Preference • A Statistical Approach • Coping with Data Sparseness • Combining Joint Features • Data and Experimental Environments • Integrating Content-based and Collaborative Preference • User similarity
The Problem of Mutual Information • Mutual Information (MI) is sensitive to events with low frequency -> noise • Most of events have high MI values because of low frequency MI Feature (Word)
True mutual information Information by random effects is observed mutual information is observed probability A Feature Preference Model • True mutual information is defined by observed mutual information subtracted by the part of information caused by random effects. • It is used to cope with data sparseness problem • (Random occurrence probability) is introduced to remove the information caused by random effect. (13) is true mutual information
(8) (9) Random Occurrence Probability by Random Frequency Distribution • Random occurrence probability represents • how many events with the frequency can occur in a random experiment. • the ratio of information provided by randomness. • Probability of a given frequency P(freq(w)) is adopted as random occurrence probability. • to represent how many events with the same frequency occur in a random experiment. • freq(w) is the frequency of the event w. • N(W’) is the number of events in the random experiment.
Pareto Distribution for Random Occurrence Distribution • Fortunately, we do not have to do random experiments in order to get the random occurrence probability. • Pareto distribution is • often used for modeling incomes in economics • a kind of power-laws distribution • Pareto distribution represents • how many events with a given frequency can occur in a random experiment. • for example, low frequency events have high Pareto probability value since there are many low frequency events in random experiments. (12)
Positive Preference Model for a Feature • The random occurrence probability in positive preference is determined by the size of overlapped area between the given feature X(w) and the user behavior history V • The overlapped area of set X(w2) is very small, so it can happen easily in random occasion (15) Figure 2. Examples of positive preference. The size of a universal set U is far larger than this picture shows.
A Distribution Graph – Direct Mutual Information • Mutual Information (MI) is sensitive to events with low frequency • Most of features have high MI values because of noise Preference Feature (Word)
A Distribution Graph – Applying The Random Occurrence Probability • After applying the random occurrence probability, preference intensity is lowered in most features. • The graph has ‘S’ shape. > • Too many features have negative preferences, which is not correct Preference Too many features have negative preferences. Feature (Word)
Disparity of Positive and Negative Preferences • Items containing feature w1 are selected more frequently • positively preferred one. • Items containing feature w2 are selected rarely or never by a user. • It cannot be said that it is negatively preferred one • A user is indifferent to the item. • A user dose not know the item for the lack of chance • A user never met the item. Figure 1. Positive and Negative Preference Examples
Negative Preference Model for a Feature The Assumption of Negative Preference • Random occurrence probability for negative preference is defined by unconditional probability of given feature • Only when a user did not select a feature • even though it occurred frequently enough, • the feature can have strongly negative • preference. Figure 3. Examples of negative preference (16)
A Distribution Graph – Applying The Random Occurrence Probability Preference Feature (Word)
A Distribution Graph – The Negative Preference Model • Most of the features with negative preferences are lowered to neutral • Some of features with negative preference are strengthen when their normal occurrences are large enough. Preference A small number of features have strongly negative preferences Feature (Word)
The Overall Feature Preference Model • Preference model for a feature. • positive/negative preference • random occurrence probability • mutual information (17)
Contents • Introduction • Concept of Preference • A Statistical Approach • Coping with Data Sparseness • Combining Joint Features • Data and Experimental Environments • Integrating Content-based and Collaborative Preference • User similarity
Joint Feature • k-th joint feature is the composite of k single features • Joint feature can provide more accurate information if the information source is reliable. Pref(“Gulf”) < 0 and Pref(“war”) <0 but Pref(“Gulf war”) > 0 • Joint features suffer from severe data sparseness problems • Most of count values of joint features (k>=3) are 0 or 1.
Combining Information of Joint Features • Policy: If the join preference information is reliable, then it is used. otherwise, single preferences are used instead. • (feature-combining weight for k-th joint feature) controls which one is used. (5)
(6) (7) Combining Joint Features using Random Occurrence Probability • Feature-combining weight represents the randomness of all the superset features. • We can define the feature-combining weight function by recursion as equation (6) • We can get the feature-combining weight function by product of all random occurrence probability as equation (7) • Random occurrence probability represents the ratio of information provided by randomness
Applying Examples of The Joint Feature Model • Example 1. Pref(“Gulf”) = -0.2 (negative), Pref(“war”) = -0.2 (negative) Pref(“Gulf” and “war”) = 0.3 (positive), P_rand(“Gulf” and “war”)=0.8 Then, the final joint preference is Pref(“Gulf war”)=(1-P_rand(“Gulf” and “war”)) * Pref(“Gulf” and “war”) +P_rand(“Gulf” and “war) * (Pref(“Gulf”) + Pref(“war”)) =(1-0.8) * 0.3 + 0.8 * (-0.2 -0.2) > = -0.26 (negative) • Example 2 Equal to Example 1, but P_rand(“Gulf” and “war”)=0.1 Pref(“Gulf war”) =(1-0.1) * 0.3 + 0.1 * (-0.2 -0.2) > = 0.23 (positive) If the join preference info. is not reliable, single preferences are used instead. If the join preference info. is reliable, the joint preference is used
Contents • Introduction • Concept of Preference • A Statistical Approach • Coping with Data Sparseness • Combining Joint Features • Data and Experimental Environments • Integrating Content-based and Collaborative Preference • User similarity
The Gathering Method of TV Audience Measurement • The set-top box is installed into TV in each home which is sampled according to balanced distribution. • Selected channel information is automatically transferred into the company. AC Nielsen Inc.
3 months 1 week Training data test data Program data for input feature Experimental Environment • The TV program recommendation task was performed. • Nouns in title and synopsis were used as input features. • Recommend preferred TV programs. • Data of AC Nielsen Korea* • Three months data from June 15th, 2001 • Program data • broadcasting time, channel, etc. • 189,653 programs at 74 channels. • User behavior history data • 200 users • user id, channel changing time, and corresponding channel id • TV program data • Title and synopsis were collected from TV program websites by webrobot, • Nouns were extracted by regular expression • Training and Test • Training data : all but the last one week • Test data : the last week only. AC Nielsen Korea* : the authorized audience measurement company in Korea Item : each TV program Feature : words time, channel, genre,etc
Experimental Results • Random occurrence probability works well (0.598->0.709) • Combining Joint feature works well (0.692->0.773) Table 3. Accuracy results for each preference model.
Comparisons with Other Models • There are comparable well-known models in personalization, information retrieval, and association-rule mining areas. • We implemented these models to compare with the proposed model.
Comparison with Association Measures • The proposed models have higher accuracy than the other models. • Reasons? • Only positive examples are given. • Strong in data sparseness. • Prediction is based on estimated preference. 7 days extension title match 10 candidates Scenario