Modeling preference integrating content based and collaborative preference on documents
Download
1 / 80

Modeling Preference: - PowerPoint PPT Presentation


  • 280 Views
  • Uploaded on

Modeling Preference: Integrating Content-based and Collaborative Preference on Documents. Sung-Young Jung Intelligent Systems Program chopin@cs.pitt.edu http://www.isp.pitt.edu/~chopin/ Presentation at AI forum April 7, 2006. Contents. Introduction Concept of Preference

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Modeling Preference:' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Modeling preference integrating content based and collaborative preference on documents l.jpg

Modeling Preference:Integrating Content-based and Collaborative Preference on Documents

Sung-Young Jung

Intelligent Systems Program

chopin@cs.pitt.edu

http://www.isp.pitt.edu/~chopin/

Presentation at AI forum

April 7, 2006


Contents l.jpg
Contents

  • Introduction

  • Concept of Preference

  • A Statistical Approach

  • Coping with Data Sparseness

  • Combining Joint Features

  • Data and Experimental Environments

  • Integrating Content-based and Collaborative Preference

    • User similarity


Introduction of preference modeling l.jpg

Introduction of Preference Modeling


Generals in preference modeling l.jpg
Generals in Preference Modeling

  • General

    • There are many users with different behaviors.

    • A user behavior is dependent on the user preference.

    • The goal is to predict the future behavior based on the user preference.

  • Applications

    • Recommendation systems, Personalization systems, Prediction systems.

Item: Document Product TV program


An item and features l.jpg
An Item and Features

  • We want to estimate preference of the item Pref(x) given the feature set

Item : x

Document Product TV program

  • Words in title &

    Content

  • Category

  • Name

  • Color

  • Words in Description

  • Price

feature : w

  • Words in title

  • Words in synopsis

  • Genre

  • Channel

  • Time


Problem description l.jpg
Problem Description

  • We want to estimate

    • Preference of an item: Pref(x)

    • Preference of a feature: Pref(wi)

  • Data are collected from natural human behaviors

    • Navigation, browsing, purchasing, etc.

    • Data can contain noise

  • Only positive examples are given

    • Negative examples are hidden

    • However, we want to find items with negative preference.

    • Common machine learning algorithms do not work well.

The all items :

Item x:

Documents Product TV program

The selected item set V :


Previous researches on preference l.jpg
Previous Researches on Preference

  • Recommendation Algorithms

    • Collaborative filtering, social filtering.

      • recommendation based on information of similar users.

    • Content-based filtering.

      • Similar idea to document similarity.

      • Using content information in an item.

  • Information Retrieval.

    • Document similarity.

      • Vector similarity.

  • Association-rule mining.

    • Mining associated item set

  • In the previous studies, a model was built without a formalized concept of “Preference”.

  • Utility Theory

    • Preference is given by a human designer, not estimated from data.


Problems of vector similarity l.jpg
Problems of Vector Similarity

  • Vector similarity is generally used in collaborative filtering, and information retrieval areas

    • Cosine, inner product, Pearson Correlations

  • The resulting preference value does not provides a good interpretation about how much the user like or dislike

    • Preference 0.5 -> how much does he like??

  • Inability of providing interpretation from the preference value causes serious problems.

Pref(w1)=0.4

Pref(w2)=0.5

Pref(w1,w3)=0.6

Pref(w2,w5)=0.2

Which one is better??

vs.


Problems of probability representing preference l.jpg
Problems of Probability Representing Preference

  • Bayesian Network is often used for preference modeling.

  • Probability is not preference.

    • Probability represents how often a user selects a given item.

    • Preference is one of the factors affecting selection probability.


Problems of probability examples l.jpg
Problems of Probability: Examples

  • Someone living in the US selected Hollywood movies often. >

    • It does not mean that he has high preference on Hollywood movies.

    • It will be disappointing if you recommend a movie to him only because it is a Hollywood movie.

  • Someone living in the US selected a Korean movie with low freq. >

    • He may have high preference on Korean movies even though the probability choosing them is low.

HOLLYWOOD

HOLLYWOOD

HOLLYWOOD

HOLLYWOOD

HOLLYWOOD

HOLLYWOOD


The relation of probability and preference l.jpg
The Relation of Probability and Preference

  • (a):

    When selection probability is low even though the availability is high: negative preference

    When selection probability is high even though the availability is low: positive preference

  • (b) Preference is positive if and only if selection probability is higher than availability.

  • Because of this property, mutual Information has been proposed as a preference measure.


An issue s of negative preference l.jpg
An Issues of Negative Preference

  • How to estimate negative preference only from positive examples? >

  • The previous approaches don’t

    have a good solution for this problem. >

  • Disliked items can be selected by a user

    • Data can contain noise.

  • Estimating negative preference

    provides an advantages in recommendation.

The Items the user dislikes

The all items :

Disliked Items can be included in user’s

selection

The selected item set V :


An issue of negative preference feature l.jpg
An Issue of Negative Preference - Feature

  • How to estimate negative preference only from positive examples? >

  • The previous approaches don’t

    have a good solution for this problem. >

  • Disliked items can be selected by a user

    • Data can contain noise.

  • Estimating negative preference

    provides an advantages in recommendation. >

  • Features with negative preference often included in selected items.

The Items the user dislikes

The all items :

Disliked Items can be included in user’s

selection sometimes

The selected item set V :

All the drawings will be unframed, and will sell for relatively low prices – a couple of bucks is standard. So for less than $20, a fan of art could virtually wallpaper a room with locally drawn – and locally modeled – art.Lewis attempts some quick

math in his head, then says, “It’s been an awful long time.”

  • A user likes “art”, but

    dislikes “math”.

  • However, disliked words can be included in a document that a user selected.



Contents15 l.jpg
Contents

  • Introduction

  • Concept of Preference

  • A Statistical Approach

  • Coping with Data Sparseness

  • Combining Joint Features

  • Data and Experimental Environments

  • Integrating Content-based and Collaborative Preference

    • User similarity


The basic idea neutral preference l.jpg
The Basic Idea – Neutral Preference

w

: The item with feature w

  • X(w) : the set of item with the feature w

The all items :

w

Availability:

P(X(w))=4/10

=0.4

The selected item set V :

:Equal  Neutral Preference

Selection Probability:

P(X(w)|V)=2/5

=0.4

W

W

W


The basic idea positive preference l.jpg
The Basic Idea – Positive Preference

w

: The item with feature w

  • X(w) : the set of item with the feature w

The all items :

w

Availability:

P(X(w))=4/10

=0.4

The selected item set V :

 Positive Preference

Selection Probability:

P(X(w)|V)=3/6

=0.5

W

W

W


The basic idea negative preference l.jpg
The Basic Idea – Negative Preference

w

: The item with feature w

  • X(w) : the set of item with the feature w

The all items :

w

Availability:

P(X(w))=4/10

=0.4

The selected item set V :

 Negative Preference

Selection Probability:

P(X(w)|V)=1/4

=0.25

W

W

W


Preference of a single feature l.jpg
Preference of a Single Feature

  • Mutual information has been proposed for preference of a single feature.

    • X(w) is the set of all items which contains the feature w

    • V is the item set which a user selected.

    • I(X(w);V) is mutual information. >

    • It satisfies the property of preference related to probability and availability.

    • It gives intuitive interpretation from the preference value.

(3)

Selection Probability

Availability


Mutual information of single feature l.jpg

U

X(w)

V

w

Mutual Information of Single Feature

  • Mutual information of a feature represents the similarity between the item set with a feature X(w), and the set of selected items V.

    • w : feature

    • X(w) : a set of items which contain feature w.

    • V : a set of items which is selected by a user as “like”

    • U : Whole item set


Intuitive interpretation from the preference value 1 l.jpg

U

U

U

X(w)

V

V

X(w)

X(w)

V

w

w

w

Intuitive Interpretation from the Preference Value (1)

  • The preference value represents how a given set and the user preferred set are correlated

(a) dissimilar

(b) neutral

(c) similar

: Negative(-) < Zero(0) < Positive(+)

Negatively correlated Independent Positively correlated

Dislike Indifferent Like


Intuitive interpretation from the preference value 3 l.jpg
Intuitive Interpretation from the Preference Value (3)

  • You can get clear meaning from the resulting preference value!

(2+1)

(2-1)


Item preference feature based l.jpg
Item Preference: Feature-based

  • Preference for a given item x is defined by the normalized summation of the preferences of all features.

    • on the assumption that all the features are independent.

    • M(x) is normalization term which is defined by the number of features appeared in the item x

  • Preference of a given item x can be interpreted as an average preference value of the features.

(2)



Contents25 l.jpg
Contents

  • Introduction

  • Concept of Preference

  • A Statistical Approach

  • Coping with Data Sparseness

  • Combining Joint Features

  • Data and Experimental Environments

  • Integrating Content-based and Collaborative Preference

    • User similarity


The problem of mutual information l.jpg
The Problem of Mutual Information

  • Mutual Information (MI) is sensitive to events with low frequency -> noise

  • Most of events have high MI values because of low frequency

MI

Feature (Word)


A feature preference model l.jpg

True mutual information

Information

by random effects

is observed mutual information

is observed probability

A Feature Preference Model

  • True mutual information is defined by observed mutual information subtracted by the part of information caused by random effects.

    • It is used to cope with data sparseness problem

  • (Random occurrence probability) is introduced to remove the information caused by random effect.

(13)

is true mutual information


Random occurrence probability by random frequency distribution l.jpg

(8)

(9)

Random Occurrence Probability by Random Frequency Distribution

  • Random occurrence probability represents

    • how many events with the frequency can occur in a random experiment.

    • the ratio of information provided by randomness.

  • Probability of a given frequency P(freq(w)) is adopted as random occurrence probability.

    • to represent how many events with the same frequency occur in a random experiment.

      • freq(w) is the frequency of the event w.

      • N(W’) is the number of events in the random experiment.


Pareto distribution for random occurrence distribution l.jpg
Pareto Distribution for Random Occurrence Distribution

  • Fortunately, we do not have to do random experiments in order to get the random occurrence probability.

  • Pareto distribution is

    • often used for modeling incomes in economics

    • a kind of power-laws distribution

  • Pareto distribution represents

    • how many events with a given frequency can occur in a random experiment.

    • for example, low frequency events have high Pareto probability value since there are many low frequency events in random experiments.

(12)


Positive preference model for a feature l.jpg
Positive Preference Model for a Feature

  • The random occurrence probability in positive preference is determined by the size of overlapped area between the given feature X(w) and the user behavior history V

    • The overlapped area of set X(w2) is very small, so it can happen easily in random occasion

(15)

Figure 2. Examples of positive preference.

The size of a universal set U is far larger than this picture shows.


A distribution graph direct mutual information l.jpg
A Distribution Graph – Direct Mutual Information

  • Mutual Information (MI) is sensitive to events with low frequency

  • Most of features have high MI values because of noise

Preference

Feature (Word)


A distribution graph applying the random occurrence probability l.jpg
A Distribution Graph – Applying The Random Occurrence Probability

  • After applying the random occurrence probability, preference intensity is lowered in most features.

  • The graph has ‘S’ shape. >

  • Too many features have negative preferences, which is not correct

Preference

Too many features have negative preferences.

Feature (Word)


Disparity of positive and negative preferences l.jpg
Disparity of Positive and Negative Preferences Probability

  • Items containing feature w1 are selected more frequently

    • positively preferred one.

  • Items containing feature w2 are selected rarely or never by a user.

    • It cannot be said that it is negatively preferred one

      • A user is indifferent to the item.

      • A user dose not know the item for the lack of chance

      • A user never met the item.

Figure 1. Positive and Negative Preference Examples


Negative preference model for a feature l.jpg
Negative Preference Model for a Feature Probability

The Assumption of Negative Preference

  • Random occurrence probability for negative preference is defined by unconditional probability of given feature

  • Only when a user did not select a feature

  • even though it occurred frequently enough,

  • the feature can have strongly negative

  • preference.

Figure 3. Examples of negative preference

(16)


A distribution graph applying the random occurrence probability35 l.jpg
A Distribution Graph – Applying The Random Occurrence Probability

Preference

Feature (Word)


A distribution graph the negative preference model l.jpg
A Distribution Graph – The Negative Preference Model Probability

  • Most of the features with negative preferences are lowered to neutral

  • Some of features with negative preference are strengthen when their normal occurrences are large enough.

Preference

A small number of features have strongly negative preferences

Feature (Word)


The overall feature preference model l.jpg
The Overall Feature Preference Model Probability

  • Preference model for a feature.

    • positive/negative preference

    • random occurrence probability

    • mutual information

(17)



Contents39 l.jpg
Contents Probability

  • Introduction

  • Concept of Preference

  • A Statistical Approach

  • Coping with Data Sparseness

  • Combining Joint Features

  • Data and Experimental Environments

  • Integrating Content-based and Collaborative Preference

    • User similarity


Joint feature l.jpg
Joint Feature Probability

  • k-th joint feature is the composite of k single features

  • Joint feature can provide more accurate information if the information source is reliable.

    Pref(“Gulf”) < 0 and Pref(“war”) <0

    but

    Pref(“Gulf war”) > 0

  • Joint features suffer from severe data sparseness problems

    • Most of count values of joint features (k>=3) are 0 or 1.


Combining information of joint features l.jpg
Combining Information of Joint Features Probability

  • Policy: If the join preference information is reliable, then it is used.

    otherwise, single preferences are used instead.

  • (feature-combining weight for k-th joint feature) controls which one is used.

(5)


Combining joint features using random occurrence probability l.jpg

(6) Probability

(7)

Combining Joint Features using Random Occurrence Probability

  • Feature-combining weight represents the randomness of all the superset features.

    • We can define the feature-combining weight function by recursion as equation (6)

    • We can get the feature-combining weight function by product of all random occurrence probability as equation (7)

      • Random occurrence probability represents the ratio of information provided by randomness


Applying examples of the joint feature model l.jpg
Applying Examples of The Joint Feature Model Probability

  • Example 1.

    Pref(“Gulf”) = -0.2 (negative),

    Pref(“war”) = -0.2 (negative)

    Pref(“Gulf” and “war”) = 0.3 (positive), P_rand(“Gulf” and “war”)=0.8

    Then, the final joint preference is

    Pref(“Gulf war”)=(1-P_rand(“Gulf” and “war”)) * Pref(“Gulf” and “war”)

    +P_rand(“Gulf” and “war) * (Pref(“Gulf”) + Pref(“war”))

    =(1-0.8) * 0.3 + 0.8 * (-0.2 -0.2) >

    = -0.26 (negative)

  • Example 2

    Equal to Example 1, but P_rand(“Gulf” and “war”)=0.1

    Pref(“Gulf war”) =(1-0.1) * 0.3 + 0.1 * (-0.2 -0.2) >

    = 0.23 (positive)

If the join preference info. is not reliable, single preferences are used instead.

If the join preference info. is reliable, the joint preference is used



Contents45 l.jpg
Contents Probability

  • Introduction

  • Concept of Preference

  • A Statistical Approach

  • Coping with Data Sparseness

  • Combining Joint Features

  • Data and Experimental Environments

  • Integrating Content-based and Collaborative Preference

    • User similarity


The gathering method of tv audience measurement l.jpg
The Gathering Method of TV Audience Measurement Probability

  • The set-top box is installed into TV in each home which is sampled according to balanced distribution.

  • Selected channel information is automatically transferred into the company.

AC Nielsen Inc.


Experimental environment l.jpg

3 months Probability

1 week

Training data

test data

Program data for input feature

Experimental Environment

  • The TV program recommendation task was performed.

    • Nouns in title and synopsis were used as input features.

    • Recommend preferred TV programs.

  • Data of AC Nielsen Korea*

    • Three months data from June 15th, 2001

    • Program data

      • broadcasting time, channel, etc.

      • 189,653 programs at 74 channels.

    • User behavior history data

      • 200 users

      • user id, channel changing time, and corresponding channel id

  • TV program data

    • Title and synopsis were collected from TV program websites by webrobot,

      • Nouns were extracted by regular expression

  • Training and Test

    • Training data : all but the last one week

    • Test data : the last week only.

AC Nielsen Korea* : the authorized audience measurement company in Korea

Item : each TV program

Feature : words

time, channel, genre,etc


Experimental results l.jpg
Experimental Results Probability

  • Random occurrence probability works well (0.598->0.709)

  • Combining Joint feature works well (0.692->0.773)

Table 3. Accuracy results for each preference model.


Comparisons with other models l.jpg
Comparisons with Other Models Probability

  • There are comparable well-known models in personalization, information retrieval, and association-rule mining areas.

  • We implemented these models to compare with the proposed model.


Comparison with association measures l.jpg
Comparison with Association Measures Probability

  • The proposed models have higher accuracy than the other models.

  • Reasons?

    • Only positive examples are given.

    • Strong in data sparseness.

    • Prediction is based on estimated preference.

7 days extension

title match

10 candidates

Scenario


Why prediction based on preference can be better l.jpg
Why Prediction Based on Preference Can be Better? Probability

  • The preferred set and the observed set is different.

    • The observed set can contains dis-preferred ones.

  • The preferred set is a hidden set. However, people’s selection behavior is made by the preferred set.

  • Thus, a hypothesis to predict the preferred set helps to predict the observed set. >

    Ex) not recommending not-preferred but observed set.

  • What is the hypothesis to predict the preferred set here?

    • Mutual Information!.

The preferred set

The observed set

A not-preferred

but selected area >

Not recommending this part can improve the prediction accuracy



Contents53 l.jpg
Contents Probability

  • Introduction

  • Concept of Preference

  • A Statistical Approach

  • Coping with Data Sparseness

  • Combining Joint Features

  • Data and Experimental Environments

  • Integrating Content-based and Collaborative Preference

    • User similarity


Collaborative filtering l.jpg
Collaborative Filtering Probability

  • Collaborative filtering

    • A method to predict user’s selections based on similar users’ preferences

    • Vector similarity causes a problem in providing interpretation of the resulting preference value.

  • How to combine collaborative filtering with content-based preference?

v

u


Chi square distribution for user similarity l.jpg
Chi-square Distribution for User Similarity Probability

  • In order to maintain the ability to provide clear interpretation of the preference value, the similarity should be given by a probability

  • Here, one idea is to use chi-square distribution to represent similarity between users.


Chi square distribution for user similarity56 l.jpg

u Probability

v

Chi-Square Distribution for User Similarity

  • Chi-square distribution represents the probability that given two distributions are statistically equal.


Two ways to get user similarity l.jpg
Two Ways to Get User Similarity Probability

  • Item-based similarity (Pref(x))

    • Disadvantage: we should get preferences on the same item.

    • Advantage: computational cost is low.

  • Feature-based similarity (Pref(w))

    • Advantage: we don’t have to get preferences on the same item

    • Disadvantage : computational cost is high.

  • In document preference, Pref(w) is desirable.


Trials with chi square distribution l.jpg
Trials with Chi-square Distribution Probability

  • Problem

    • Chi-square Distribution produces almost 0 probability with very large data

    • The feature set size is larger than 100,000

    • Chi-square only considers the difference of input values

      Ex) 15 vs. 5 and -5 vs. 5 have the same chi-square value.

  • Trial Solutions

    • The features with low preference strength (|pref(w)|<5) were removed

    • Then, the feature set size reduced to the order of 1,000

    • Discretized the chi-square value into [-5, 0, 5]



Future works l.jpg
Future Works Probability

  • Trying statistical comparison methods for user similarity

  • Estimating the improvement by integrating collaborative and content-based approach.


Summary and conclusions l.jpg
Summary and Conclusions Probability

  • Concept of preference

    • Preference is different from probability.

    • Positive and Negative preference.

    • You can get clear interpretation from the resulting preference value.

  • Prediction

    • A preferred set is different from an observed set.

    • Using mutual information is the hypothesis to predict the preferred set.

    • why prediction based on preference can be better than without preference.

  • Trials to extend the model integrating collaborative preference.


Slide62 l.jpg

LG Electronics

IEEE Transaction on Knowledge and Data Engineering

  • Jung, Sung Young, Jeong-Hee Hong, Taek-Soo Kim, "A Statistical Model for User Preference", IEEE Transactions on Knowledge and Data Engineering, June 2005 (Vol. 17, No. 6)   pp. 834-843


The end of presentation l.jpg

The End of Institute of Technology.Presentation


Modeling preference integrating content based and collaborative preference on documents64 l.jpg

Modeling Preference: Institute of Technology.Integrating Content-based and Collaborative Preference on Documents

Sung-Young Jung

Intelligent Systems Program

chopin@cs.pitt.edu

http://www.isp.pitt.edu/~chopin/

Presentation at AI forum

April 7, 2006


Additional slides l.jpg

Additional Slides Institute of Technology.


Feature combining weight l.jpg
Feature-Combining Weight Institute of Technology.

  • As k -> 0

    • Do not use k-th joint feature because there are reliable superset features.

  • As k -> infinite

    • Use k-th joint feature because there are not reliable superset features.

  • After it is combined with feature preference

    • Bell-shaped graph

    • The only proper size of joint features will be used.

Figure: A typical graph of the feature-combining weight function


Accuracy measure precision l.jpg
Accuracy measure : Precision Institute of Technology.

  • Measure of recommendation accuracy

  • Defined by the number of correct answers divided by the number of recommended candidates


Process diagram of preference systems l.jpg
Process Diagram of Preference Systems Institute of Technology.

Item : x

User

Input

User behavior

history

: V

Documents Product TV program

User behavior DB

User Profile

: G

Output

Preference

Model

: Pref(x)

prediction

Preference value

for item x

Feature Preference DB


Intuitive interpretation from the preference value 2 l.jpg

U Institute of Technology.

U

U

X(w)

V

V

X(w)

w

X(w)

V

w

w

Intuitive Interpretation from the Preference Value (2)

  • The preference value is interpreted as “a user prefers a given item with a degree of additional bits of information compared to neutral situation”.

(a) dissimilar

(b) neutral

(c) similar

: -2 < 0 < 2

likes it with a degree of

2 bits of information

compared to the neutral case.

dislikes it with a degree of

2 bits of information

compared to the neutral case.

the neutral case

  • Bit : Information measure from the convention in information theory


Approximation to double feature model l.jpg
Approximation to Double Feature Model Institute of Technology.

  • Long sized joint features suffer from data sparseness problem

  • Double feature, which is a joint feature in size two, can be one of good choices for many practical problems.

  • Easily derived double feature model by restricting the feature size by 2

(18)

(19)


Goals and model requirements l.jpg
Goals and Model Requirements Institute of Technology.

  • Target goals

    • to automatically extract preference of each user

      • from natural user behavior such as navigation and purchasing.

    • to predict the user preference to unknown items.

      • using the extracted preference information.

    • to build a formal concept of user preference

      • which is understandable by intuition

      • which gives easy interpretation for a preference value.

        4. to reflect peculiar characteristics of preference

      • positive and negative preference

  • Model requirements.

    1. Ability for dealing with rare events

    • to circumvent data sparseness problem.

      2. Least computational overhead.

    • fast response and update time.


Scalability issues l.jpg
Scalability Issues Institute of Technology.

  • The model requires only probability terms that can be calculated directly by frequency counting from training data.

  • The time complexity

    • Proportional to the number of input feature variables.

    • For the single feature model (K=1)

      • O(M(x)) where M(x) is the number of single feature.

    • For general environment (K>1)

      • O(M(x)CK), where nCa is combination function.

      • restriction required like selecting informative joint variables in association rule mining, (Apriori algorithm )

    • For the case that only consecutive features are allowed for joint features,

      • O(M(x))


Experimental scenario l.jpg

Test in 7 day Institute of Technology.

Training data

Recommend

(1 day)

  • 1-day recommendation & 7 day extension test

    • 7 days extension scenario.

Experimental Scenario

  • Variable candidates scenario

  • 10 candidates scenario

  • Title match scenario

    • introduced because many channels broadcast the same program at the same time.

    • regards program with the same title as correct one regardless of the broadcasting time and channel.

    • Re-broadcasting program,which is described in the title clearly,is regarded as a different program.

  • 7 day extension scenario

    • recommended program is regarded as correct one if it is watched in 7 days

      • introduced because a user usually don’t watch all the preferred program series every time in the TV environment

Training data

Test

(1 day)

  • 1-day recommendation test

    • Variable candidates scenario, 10 candidates scenario, Title match, 10 candidates scenario


Example tv program recommendation l.jpg
Example: TV Program Recommendation Institute of Technology.


Bits of information l.jpg
Bits of Information Institute of Technology.

  • Bit : Information measure from the convention in information theory

    • Although information is sometimes measured in characters, as when describing the length of an email message, or digits, as in the length of a phone number, the convention in information theory is to measure information in bits.

  • A ``bit'' (the term is a contraction of binary digit) is either a zero or a one. Because there are 8 possible configurations of three bits (000, 001, 010, 011, 100, 101, 110, and 111), we can use three bits to encode any integer from 1 to 8.


Conclusions l.jpg
Conclusions Institute of Technology.

  • All of the target goals are achieved.

    • To automatically extract preference of each user. 

    • To predict the user preference to unknown items. 

    • To build a formal concept of user preference

      • which is understandable by intuition. (by mutual information)

      • which gives interpretation for a preference value.  (by the bits of information)

    • To reflect peculiar characteristics of preference .  (by positive, negative preference)

      5. To deal with rare events. (by random occurrence probability)

    • To have least computational overhead.  ( by O(M(x)) )

  • The proposed models have the best accuracy among well-known models in association-rule mining in recommendation problem

    • The proposed models are expected to applied into association area successfully.


Slide77 l.jpg
QnA Institute of Technology.

  • How your model is different from collaborative filtering?

    • Collaborative filtering is one of the well-known recommendation methods. They use social information similar to given user to predict his or her preference for each item. But in general, they do not have a formalized concept of preference. They usually directly use vector similarity. So, they have difficulty in saying how a user prefer a given item from the vector similarity value. We proposed the formalized concept of preference using mutual information, consequently, our model can say the intensity of preference by bits of information.

      2. How your model is different from content-based filtering?

    • The overall environment of Content-based filtering is similar to our model in point that our model uses content information for preference learning. I want to say the main difference whether concept and definition of preference are adopted or not. The main frame becomes totally different when we introduce the concept of preference. Content-based filtering also uses vector-similarity-based approach, but our model use mutual information.

      3. How your model is different from Document similarity model in Information Retrieval area?

    • Content-based filtering model is inherited from Document similarity model. As I answered before, the main difference is we created the formalize model of preference.


Slide78 l.jpg

4. Association rule-mining and preference prediction is rather different problem. Why do you compared your model with association rule-mining ?

  • When Association rule-mining model is applied to recommendation problem, the frame work becomes similar to others. By applying to finding associated items from user history, association problem can be applied to recommendation problem. Recommendation problem is very good to estimate the accuracy objectly. We performed the comparison of association model and ours because overall framework is more similar than other approaches such as collaborative filtering and content-based filtering. As I mentioned in conclusion in my presentation, we expect that our mutual-information-based measure will be applied as association measure successfully.

    5. Would explain meaning of the accuracy you adopted in detail? what is 7-day extension scenario?

  • We adopted Precision as accuracy measure. Precision is defined by the number of correct one among the recommended candidates. in 7-day scenario, the system recommend programs for 1 test day, and if the user watch them in 7 days, they are scored as correct one.

    6. Can you explain The main reason that your model has better accuracy than previous one?

  • There are several reasons why our model has better accuracy. The first reason is that random occurrence probability removed noise in training data caused by lack of information. The second reason is that we used multi-sized joint variables. It was possible because of random occurrence probability. The third reason is that the formalized definition for preference facilitated the sophisticated models such as positive preference and negative preference models. All of three factors combined nicely, so the accuracy becomes better than others.


Slide79 l.jpg

7. I want to know the overall speed to learn and predict user preference. How long time does it takes for learning and prediction?

  • As you can see, our model requires linear computation given the number of input features. By adopting double feature approximation, we managed the number of feature to be linear. There are learning process and prediction process. In the learning process, just count value of each feature is necessary. it spent about 20 second for 1 day Tv program updates which contains about 2000 program entries.

    8. Why did you apply your model to TV program recommendation?

  • This research was performed in DTV personalization system development. Out model is applied in recommendation engine. I think that there is no critical difference from other general recommendation area. I hope our model will be applied in many applications such as document recommendation, product recommendation.

    9. It seems that genre information is more important for TV program recommendation. Is using words in the text proper for it?

  • That is good question. In TV environment, genre information is very important. We developed a general Tv-program recommendation system using all information like genre, time, channel, keyword etc. Our model is a part of the system using keyword information. After we performed several experiments, we found that there is no big difference when other information is combined to our model. There are another aspects for it. Even DTV environment, the content information will not be filled fully because they require additional human-power. Genre information often is missing. The only information not to be expected to miss is "title" which can be used keyword.In these considerations, we concluded that keyword information is very important in TV environment.


Slide80 l.jpg

10. There are many TV programs in same series, and the contents of each instance of program is different everyday, like news. How do you scored the accuracy for this case?

  • The recommendation was performed for one day always. The recommendation system gives a set of program candidates with exact broadcasting time. We made 4 scenario. Two of them score recommended candidates as correct, only when a user selected the recommended one with the broadcasting time exactly.. The other two scenario adopted title match because there are same TV program broadcasted in different channels such as local TV station. Many cable TV broad broadcast the same program several times a day. The content is exactly same, so the title match scenario was performed. But re-broadcasting program, which is described in the title clearly, is regarded as different one. The experimental results showed that the accuracy of our model is improved over all 4 scenarios.


ad