Web Personalization: Techniques, Challenges, and Recommendations

Recommender SystemsSession E Robin Burke DePaul University Chicago, IL

Roadmap • Session A: Basic Techniques I • Introduction • Knowledge Sources • Recommendation Types • Collaborative Recommendation • Session B: Basic Techniques II • Content-based Recommendation • Knowledge-based Recommendation • Session C: Domains and Implementation I • Recommendation domains • Example Implementation • Lab I • Session D: Evaluation I • Evaluation • Session E: Applications • User Interaction • Web Personalization • Session F: Implementation II • Lab II • Session G: Hybrid Recommendation • Session H: Robustness • Session I: Advanced Topics • Dynamics • Beyond accuracy

Web Personalization A subtype of recommendation personalizing the browsing experience of a user by dynamically tailoring the look, feel, and content of a Web site to the user’s needs and interests. Items web pages Data instead of ratings log data from web sites

Raw data

Challenges and Pitfalls Technical Challenges data collection and data preprocessing defining actionable knowledge choosing personalization algorithms Implementation/Deployment Challenges what to personalize when to personalize degree of personalization or customization how to target information without being intrusive

Usage-Based Profiles Characteristics implicit ratings preferences inferred from actions passive stance system takes no activity to gather more rating information anonymity many users will be anonymous cannot rely on long-term identity data characteristics usually more voluminous than explicit rating data more noise than explicit rating data

Heuristic Preference Indicators How to know if user "likes" something? basic heuristic: length of time spent the longer the more liked What if the user takes a phone call? threshold What if the page is at the end of the session? either ignore possibly lose crucial information or consider positive significant noise in failed interactions What if the page is really short? normalize on page length What if the back button doesn't reload page? infer session activity What if the "user" is actually a web crawler? recognize pattern of behavior

Problems Cold start cannot count on long-term profiling must be able to make predictions with short profiles Covert model user has no direct input can't express interests directly even if they are willing to Sparsity much worse than for explicit profiles many more users many more items (pages) profiles much shorter "Hidden web" many pages are produced by database queries page differences not reflected in the log can't recommend indistinguishable items

Advantages No explicit user ratings or interaction with users Helps preserve user privacy, by making effective use of anonymous data Large user base makes CF effective if we can implement it scalably Content-based / knowledge-based approaches hard to use on web data

Web Personalization Process Generate aggregate user models not single users but clusters of similar users guess why? off-line process Steps Clustering user transactions Clustering items / pageviews Association rule mining Sequential pattern discovery Provide recommendation on-line match a user’s active session to provide dynamic content

Off-line Process Data Preparation Phase Pattern Discovery Phase Web & Application Server Logs Pattern Analysis Site Content & Structure Pattern Filtering Aggregation Characterization Aggregate Usage Profiles Data Preprocessing Usage Mining Data Cleaning Pageview Identification Sessionization Data Integration Data Transformation Transaction Clustering Pageview Clustering Correlation Analysis Association Rule Mining Sequential Pattern Mining Domain Knowledge Patterns User Transaction Database

On-line Process Recommendation Engine Aggregate Usage Profiles <user,item1,item2,…> Integrated User Profile Recommendations Stored User Profile Domain Knowledge Active Session Web Server Client Browser

Representation Pageview/objects Session/user data Raw weights are usually based on time spent on a page, but in practice, need to normalize and transform.

Clustering k-means clustering algorithm specify a number of clusters system finds an arrangement of items that minimizes the mean distance within the items in each cluster is minimized for each cluster think of each items as a vector generate the centroid

Clustering Transaction clusters as Aggregate Profiles Each transaction is viewed as a pageview vector Each cluster contains a set of transaction vectors with a centroid Each centroid acts as an aggregate profile with representing the weight for pageview pi in the profile Compute similarity between a current user’s profile (or the active user session) and the cluster centroids

Recommendation Algorithm Keep track of users’ navigational history through the site a fixed-size sliding window over the active session to capture the current user’s “short-term” history depth Match current user’s activity against the discovered profiles profiles either can be based on aggregate usage profiles, or are obtained directly from association rules or sequential patterns Dynamically generated recommendations are added to the returned page each pageview can be assigned a recommendation score based on matching score to user profiles (e.g., aggregate usage profiles) “information value” of the pageview based on domain knowledge (e.g., link distance of the candidate recommendation to the active session)

Matching score computed using cosine similarity User’s active session (pageviews in the current window) is compared to each aggregate profile (both are viewed as pageview vectors) Weight of items in the profile vector is the significance weight of the item for that profile Weight of items in the session vector can be all 1’s, or based on some method for determining their significance in the current session Matching Sessions

Recommendations from each matching profile recommend the items not already in the user session window, and not directly linked from the pages in the current session window the recommendation score for an item is based on profile matching score (similarity to session window) and the weight of the item in that profile can include novelty weight items farther away from the current location of user higher

Example Sample cluster centroid from dept. Web site (cluster size =330)

Using Clusters for Personalization Given an active session A  B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile. Original Session/user data Result of Clustering PROFILE 0 (Cluster Size = 3) -------------------------------------- 1.00 C.html 1.00 D.html PROFILE 1 (Cluster Size = 4) -------------------------------------- 1.00 B.html 1.00 F.html 0.75 A.html 0.25 C.html PROFILE 2 (Cluster Size = 3) -------------------------------------- 1.00 A.html 1.00 D.html 1.00 E.html 0.33 C.html

Association Rules An alternative to clusters is to build association rules Association rule a tuple <i1, i2, .., ik> all of which appear together with some frequency Can be used for prediction if a user has seen <i1, i2, ...ik-1> then predict ik <i2, i3, ...ik> then predict i1

Learning Association Rules Multiple passes through a database of transactions 1st time we collect all items that occur with a certain frequency 2nd time we collect all pairs both items must be in the set above 3rd time collect all triples until there are none left

Recommending with Association Rules Keep track of users’ navigational history through the site Match current user’s activity against the association rules find the longest rule with at least one non-matching entry sort by confidence predict the entry or entries not in the user's profile Dynamically generated recommendations are added to the returned page

Sequential methods Sequential patterns as profiles similar to association rules, but the ordering of accessed items is taken into account use Markov models systems with discrete states and probabilistic transitions commonly used for pre-fetching pages in web servers Characteristics high accuracy but usually low coverage few users get recommendations sometimes this is OK

Example: Frequent Itemsets Sample Transactions Frequent itemsets (using min. support frequency = 4)

Example: Sequential Patterns Sample Transactions CSP (min. support frequency = 4) SP (min. support frequency = 4)

Example: An Itemset Graph Frequent Itemset Graph for the Example Given an active session window <B,E>, the algorithm finds items A and C with recommendation scores of 1 and 4/5 (corresponding to confidences of the rules {B,E } => {A } and {B,E } => {C} ).

Example: Frequent Sequence Trie Frequent Sequence Trie for the Example Given an active session window <A,B>, the algorithm finds item E with recommendation score of 1 (corresponding to confidences of the rules {A,B } => {E }.

Impact of Window Size • Increasing window sizes (using larger portion of user’s history) generally leads to improvement in precision This example is based on the association rule approach

Associations vs. Sequences • Comparison of recommendations based on association rules, sequential patterns, contiguous sequential patterns, and standard k-nearest neighbor Support threshold for Association, SP, CSP = 0.04

Problems with Web Usage Mining New item problem Patterns will not capture new items recently added Bad for dynamic Web sites Poor machine interpretability Hard to generalize and reason about patterns No domain knowledge used to enhance results E.g., Knowing a user is interested in a program, we could recommend the prerequisites, core or popular courses in this program to the user Poor insight into the patterns themselves The nature of the relationships among items or users in a pattern is not directly available

Roadmap • Session A: Basic Techniques I • Introduction • Knowledge Sources • Recommendation Types • Collaborative Recommendation • Session B: Basic Techniques II • Content-based Recommendation • Knowledge-based Recommendation • Session C: Domains and Implementation I • Recommendation domains • Example Implementation • Lab I • Session D: Evaluation I • Evaluation • Session E: Applications • User Interaction • Web Personalization • Session F: Implementation II • Lab II • Session G: Hybrid Recommendation • Session H: Robustness • Session I: Advanced Topics • Dynamics • Beyond accuracy

Web Personalization: Techniques, Challenges, and Recommendations

Web Personalization: Techniques, Challenges, and Recommendations

Presentation Transcript

Recommender systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender Systems

Recommender systems

Recommender Systems

Recommender Systems

Recommender Systems Session I

Recommender Systems