Modeling Preference: Integrating Content-based and Collaborative Preference on Documents

Modeling Preference:Integrating Content-based and Collaborative Preference on Documents Sung-Young Jung Intelligent Systems Program chopin@cs.pitt.edu http://www.isp.pitt.edu/~chopin/ Presentation at AI forum April 7, 2006

Contents • Introduction • Concept of Preference • A Statistical Approach • Coping with Data Sparseness • Combining Joint Features • Data and Experimental Environments • Integrating Content-based and Collaborative Preference • User similarity

Introduction of Preference Modeling

Generals in Preference Modeling • General • There are many users with different behaviors. • A user behavior is dependent on the user preference. • The goal is to predict the future behavior based on the user preference. • Applications • Recommendation systems, Personalization systems, Prediction systems. Item: Document Product TV program

An Item and Features • We want to estimate preference of the item Pref(x) given the feature set Item : x Document Product TV program • Words in title & Content • Category • Name • Color • Words in Description • Price feature : w • Words in title • Words in synopsis • Genre • Channel • Time

Problem Description • We want to estimate • Preference of an item: Pref(x) • Preference of a feature: Pref(wi) • Data are collected from natural human behaviors • Navigation, browsing, purchasing, etc. • Data can contain noise • Only positive examples are given • Negative examples are hidden • However, we want to find items with negative preference. • Common machine learning algorithms do not work well. The all items : Item x: Documents Product TV program The selected item set V :

Previous Researches on Preference • Recommendation Algorithms • Collaborative filtering, social filtering. • recommendation based on information of similar users. • Content-based filtering. • Similar idea to document similarity. • Using content information in an item. • Information Retrieval. • Document similarity. • Vector similarity. • Association-rule mining. • Mining associated item set • In the previous studies, a model was built without a formalized concept of “Preference”. • Utility Theory • Preference is given by a human designer, not estimated from data.

Problems of Vector Similarity • Vector similarity is generally used in collaborative filtering, and information retrieval areas • Cosine, inner product, Pearson Correlations • The resulting preference value does not provides a good interpretation about how much the user like or dislike • Preference 0.5 -> how much does he like?? • Inability of providing interpretation from the preference value causes serious problems. Pref(w1)=0.4 Pref(w2)=0.5 Pref(w1,w3)=0.6 Pref(w2,w5)=0.2 Which one is better?? vs.

Problems of Probability Representing Preference • Bayesian Network is often used for preference modeling. • Probability is not preference. • Probability represents how often a user selects a given item. • Preference is one of the factors affecting selection probability.

Problems of Probability: Examples • Someone living in the US selected Hollywood movies often. > • It does not mean that he has high preference on Hollywood movies. • It will be disappointing if you recommend a movie to him only because it is a Hollywood movie. • Someone living in the US selected a Korean movie with low freq. > • He may have high preference on Korean movies even though the probability choosing them is low. HOLLYWOOD HOLLYWOOD HOLLYWOOD HOLLYWOOD HOLLYWOOD HOLLYWOOD

The Relation of Probability and Preference • (a): When selection probability is low even though the availability is high: negative preference When selection probability is high even though the availability is low: positive preference • (b) Preference is positive if and only if selection probability is higher than availability. • Because of this property, mutual Information has been proposed as a preference measure.

An Issues of Negative Preference • How to estimate negative preference only from positive examples? > • The previous approaches don’t have a good solution for this problem. > • Disliked items can be selected by a user • Data can contain noise. • Estimating negative preference provides an advantages in recommendation. The Items the user dislikes The all items : Disliked Items can be included in user’s selection The selected item set V :

An Issue of Negative Preference - Feature • How to estimate negative preference only from positive examples? > • The previous approaches don’t have a good solution for this problem. > • Disliked items can be selected by a user • Data can contain noise. • Estimating negative preference provides an advantages in recommendation. > • Features with negative preference often included in selected items. The Items the user dislikes The all items : Disliked Items can be included in user’s selection sometimes The selected item set V : All the drawings will be unframed, and will sell for relatively low prices – a couple of bucks is standard. So for less than $20, a fan of art could virtually wallpaper a room with locally drawn – and locally modeled – art.Lewis attempts some quick math in his head, then says, “It’s been an awful long time.” • A user likes “art”, but dislikes “math”. • However, disliked words can be included in a document that a user selected.

Statistical Modeling for Preference

The Basic Idea – Neutral Preference w : The item with feature w • X(w) : the set of item with the feature w The all items : w Availability: P(X(w))=4/10 =0.4 The selected item set V : :Equal  Neutral Preference Selection Probability: P(X(w)|V)=2/5 =0.4 W W W

The Basic Idea – Positive Preference w : The item with feature w • X(w) : the set of item with the feature w The all items : w Availability: P(X(w))=4/10 =0.4 The selected item set V :  Positive Preference Selection Probability: P(X(w)|V)=3/6 =0.5 W W W

The Basic Idea – Negative Preference w : The item with feature w • X(w) : the set of item with the feature w The all items : w Availability: P(X(w))=4/10 =0.4 The selected item set V :  Negative Preference Selection Probability: P(X(w)|V)=1/4 =0.25 W W W

Preference of a Single Feature • Mutual information has been proposed for preference of a single feature. • X(w) is the set of all items which contains the feature w • V is the item set which a user selected. • I(X(w);V) is mutual information. > • It satisfies the property of preference related to probability and availability. • It gives intuitive interpretation from the preference value. (3) Selection Probability Availability

U X(w) V w Mutual Information of Single Feature • Mutual information of a feature represents the similarity between the item set with a feature X(w), and the set of selected items V. • w : feature • X(w) : a set of items which contain feature w. • V : a set of items which is selected by a user as “like” • U : Whole item set

U U U X(w) V V X(w) X(w) V w w w Intuitive Interpretation from the Preference Value (1) • The preference value represents how a given set and the user preferred set are correlated (a) dissimilar (b) neutral (c) similar : Negative(-) < Zero(0) < Positive(+) Negatively correlated Independent Positively correlated Dislike Indifferent Like

Intuitive Interpretation from the Preference Value (3) • You can get clear meaning from the resulting preference value! (2+1) (2-1)

Item Preference: Feature-based • Preference for a given item x is defined by the normalized summation of the preferences of all features. • on the assumption that all the features are independent. • M(x) is normalization term which is defined by the number of features appeared in the item x • Preference of a given item x can be interpreted as an average preference value of the features. (2)

Coping with Data Sparseness Problems

The Problem of Mutual Information • Mutual Information (MI) is sensitive to events with low frequency -> noise • Most of events have high MI values because of low frequency MI Feature (Word)

True mutual information Information by random effects is observed mutual information is observed probability A Feature Preference Model • True mutual information is defined by observed mutual information subtracted by the part of information caused by random effects. • It is used to cope with data sparseness problem • (Random occurrence probability) is introduced to remove the information caused by random effect. (13) is true mutual information

(8) (9) Random Occurrence Probability by Random Frequency Distribution • Random occurrence probability represents • how many events with the frequency can occur in a random experiment. • the ratio of information provided by randomness. • Probability of a given frequency P(freq(w)) is adopted as random occurrence probability. • to represent how many events with the same frequency occur in a random experiment. • freq(w) is the frequency of the event w. • N(W’) is the number of events in the random experiment.

Pareto Distribution for Random Occurrence Distribution • Fortunately, we do not have to do random experiments in order to get the random occurrence probability. • Pareto distribution is • often used for modeling incomes in economics • a kind of power-laws distribution • Pareto distribution represents • how many events with a given frequency can occur in a random experiment. • for example, low frequency events have high Pareto probability value since there are many low frequency events in random experiments. (12)

Positive Preference Model for a Feature • The random occurrence probability in positive preference is determined by the size of overlapped area between the given feature X(w) and the user behavior history V • The overlapped area of set X(w2) is very small, so it can happen easily in random occasion (15) Figure 2. Examples of positive preference. The size of a universal set U is far larger than this picture shows.

A Distribution Graph – Direct Mutual Information • Mutual Information (MI) is sensitive to events with low frequency • Most of features have high MI values because of noise Preference Feature (Word)

A Distribution Graph – Applying The Random Occurrence Probability • After applying the random occurrence probability, preference intensity is lowered in most features. • The graph has ‘S’ shape. > • Too many features have negative preferences, which is not correct Preference Too many features have negative preferences. Feature (Word)

Disparity of Positive and Negative Preferences • Items containing feature w1 are selected more frequently • positively preferred one. • Items containing feature w2 are selected rarely or never by a user. • It cannot be said that it is negatively preferred one • A user is indifferent to the item. • A user dose not know the item for the lack of chance • A user never met the item. Figure 1. Positive and Negative Preference Examples

Negative Preference Model for a Feature The Assumption of Negative Preference • Random occurrence probability for negative preference is defined by unconditional probability of given feature • Only when a user did not select a feature • even though it occurred frequently enough, • the feature can have strongly negative • preference. Figure 3. Examples of negative preference (16)

A Distribution Graph – Applying The Random Occurrence Probability Preference Feature (Word)

A Distribution Graph – The Negative Preference Model • Most of the features with negative preferences are lowered to neutral • Some of features with negative preference are strengthen when their normal occurrences are large enough. Preference A small number of features have strongly negative preferences Feature (Word)

The Overall Feature Preference Model • Preference model for a feature. • positive/negative preference • random occurrence probability • mutual information (17)

Combining Joint Features

Joint Feature • k-th joint feature is the composite of k single features • Joint feature can provide more accurate information if the information source is reliable. Pref(“Gulf”) < 0 and Pref(“war”) <0 but Pref(“Gulf war”) > 0 • Joint features suffer from severe data sparseness problems • Most of count values of joint features (k>=3) are 0 or 1.

Combining Information of Joint Features • Policy: If the join preference information is reliable, then it is used. otherwise, single preferences are used instead. • (feature-combining weight for k-th joint feature) controls which one is used. (5)

(6) (7) Combining Joint Features using Random Occurrence Probability • Feature-combining weight represents the randomness of all the superset features. • We can define the feature-combining weight function by recursion as equation (6) • We can get the feature-combining weight function by product of all random occurrence probability as equation (7) • Random occurrence probability represents the ratio of information provided by randomness

Applying Examples of The Joint Feature Model • Example 1. Pref(“Gulf”) = -0.2 (negative), Pref(“war”) = -0.2 (negative) Pref(“Gulf” and “war”) = 0.3 (positive), P_rand(“Gulf” and “war”)=0.8 Then, the final joint preference is Pref(“Gulf war”)=(1-P_rand(“Gulf” and “war”)) * Pref(“Gulf” and “war”) +P_rand(“Gulf” and “war) * (Pref(“Gulf”) + Pref(“war”)) =(1-0.8) * 0.3 + 0.8 * (-0.2 -0.2) > = -0.26 (negative) • Example 2 Equal to Example 1, but P_rand(“Gulf” and “war”)=0.1 Pref(“Gulf war”) =(1-0.1) * 0.3 + 0.1 * (-0.2 -0.2) > = 0.23 (positive) If the join preference info. is not reliable, single preferences are used instead. If the join preference info. is reliable, the joint preference is used

Data and Experimental Environments

The Gathering Method of TV Audience Measurement • The set-top box is installed into TV in each home which is sampled according to balanced distribution. • Selected channel information is automatically transferred into the company. AC Nielsen Inc.

3 months 1 week Training data test data Program data for input feature Experimental Environment • The TV program recommendation task was performed. • Nouns in title and synopsis were used as input features. • Recommend preferred TV programs. • Data of AC Nielsen Korea* • Three months data from June 15th, 2001 • Program data • broadcasting time, channel, etc. • 189,653 programs at 74 channels. • User behavior history data • 200 users • user id, channel changing time, and corresponding channel id • TV program data • Title and synopsis were collected from TV program websites by webrobot, • Nouns were extracted by regular expression • Training and Test • Training data : all but the last one week • Test data : the last week only. AC Nielsen Korea* : the authorized audience measurement company in Korea Item : each TV program Feature : words time, channel, genre,etc

Experimental Results • Random occurrence probability works well (0.598->0.709) • Combining Joint feature works well (0.692->0.773) Table 3. Accuracy results for each preference model.

Comparisons with Other Models • There are comparable well-known models in personalization, information retrieval, and association-rule mining areas. • We implemented these models to compare with the proposed model.

Comparison with Association Measures • The proposed models have higher accuracy than the other models. • Reasons? • Only positive examples are given. • Strong in data sparseness. • Prediction is based on estimated preference. 7 days extension title match 10 candidates Scenario

Modeling Preference: Integrating Content-based and Collaborative Preference on Documents

Modeling Preference: Integrating Content-based and Collaborative Preference on Documents

Presentation Transcript

Modeling Software Architecture with UML + CPN

Multilevel Modeling

Behavior, Modeling and Design of Shear Wall-Frame Systems

Teaching an Introductory Course in Mathematical Modeling Using Technology

HAZARD MAPPING AND MODELING

Agile Modeling and Prototyping

Multi Jet Modeling

Chapter 6: The Relational Data Model

MSS Modeling

Preference Queries from OLAP and Data Mining Perspective

Intertemporal Choice Applications a nd Behavioral Mechanism Design

Snow Water Equivalent Snow Cover Runoff Modeling

Statistical Modeling of Text

Revealed Preference

Modeling and Analysis of Computer Networks

bio-modeling

Modeling Application Process

Alloy: A Modeling Language

Asking the Animals What They Feel: Preference Testing and Motivation

Runtime Power Measurement/Modeling and Thermal Modeling

RULE-BASED MULTICRITERIA DECISION SUPPORT USING ROUGH SET APPROACH