410 likes | 424 Views
Learn about web personalization, its benefits, challenges, and current approaches like rule-based, collaborative, and content-based filtering. Explore examples like FireFly Network and Letizia for personalized recommendations. Discover the role of learning interface agents in improving user interactions.
E N D
1. Web Personalization andRecommender Systems
2. 2 What is Web Personalization Web Personalization: “personalizing the browsing experience of a user by dynamically tailoring the look, feel, and content of a Web site to the user’s needs and interests.”
Related Phrases
mass customization, one-to-one marketing, site customization, target marketing
Why Personalize?
broaden and deepen customer relationships
provide continuous relationship marketing to build customer loyalty
help automate the process of proactively market products to customers
lights-out marketing
cross-sell/up-sell products
provide the ability to measure customer behavior and track how well customers are responding to marketing efforts
3. 3 Personalization v. Customization It’s a question of who controls the user’s browsing experience
Customization
user controls and customizes the site or the product based on his/her preferences
usually manual, but sometimes semi-automatic based on a given user profile
Personalization
done automatically based on the user’s actions, the user’s profile, and (possibly) the profiles of others with “similar” profiles
4. 4
5. 5
6. 6
7. 7 Challenges and Pitfalls Technical Challenges
data collection and data preprocessing
discovering actionable knowledge from the data
which personalization algorithms
Implementation/Deployment Challenges
what to personalize
when to personalize
degree of personalization or customization
how to target information without being intrusive
8. 8 Web Personalization The Problem
serve dynamic content to users based on their profiles or preferences
Current Approaches
Rule-based Filtering: store a profile for users based on explicit registration information; use prespecified rules to generate recommendations
Collaborative Filtering: requires explicit ratings from users to find profiles
e.g., GroupLens, Firefly, PHOAKS, Syskill & Webert
Content-Based Filtering: learn/store personal profiles locally or on server-side; recommendations are based on content similarity
e.g., WebWatcher, Letizia
Limitations of Current Technologies
content-based recommendations may be too narrow
user input is subjective and prone to bias
profiles may be static and can become outdated quickly
problems with scalability and accuracy
9. 9 Examples:
FireFly Network (Shardanand & Maes 95)
Net Perceptions
Users rate musical artists from like to dislike
1 = detest; 7 = can’t live without; 4 = ambivalent
There is a normal distribution around 4
However, what matters are the extremes
Nearest Neighbors Strategy: Find similar users and predicted (weighted) average of user ratings
Pearson r algorithm: weight by degree of correlation between user U and user J
1 means very similar, 0 means no correlation, -1 means dissimilar
10. 10
11. 11 Learning Interface Agents Add agents to the user interface and delegate tasks to them
Use machine learning to improve performance
learn user behavior, preferences
Useful when:
1) past behavior is a useful predictor of the future behavior
2) wide variety of behaviors amongst users
Examples:
mail clerk: sort incoming messages in right mailboxes
calendar manager: automatically schedule meeting times?
Personal news agents
portfolio manager agents
Advantages:
less work for user and application writer
adaptive behavior
user and agent build trust relationship gradually
12. 12 Letizia: Autonomous Interface Agent (Lieberman 96) Recommends web pages during browsing based on user profile
Learns user profile using simple heuristics
Passive observation, recommend on request
Provides relative ordering of link interestingness
Assumes recommendations “near” current page are more valuable than others
13. 13 Consequences of passiveness Weak heuristics
example: click through multiple uninteresting pages en route to interestingness
example: user browses to uninteresting page, then goes for a coffee
example: hierarchies tend to get more hits near root
Cold start
No ability to fine tune profile or express interest without visiting “appropriate” pages
Some possible alternative/extensions to internally maintained profiles:
expose to the user (e.g. fine tune profile) ?
expose to other users/agents (e.g. collaborative filtering)?
expose to web server (e.g. cnn.com custom news)?
14. 14 WebWatcher Dayne Freitag, Thorsten Joachims, Tom Mitchell (CMU)
A "tour guide" agent for the WWW
user tells agent what kind of information he/she is seeking (e.g., set of keywords)
WebWatcher then accompanies user while browsing the web
highlights hyperlinks that it believes will be of interest
its strategy for giving advice is learned from feedback in earlier tours
15. 15 Syskill & Webert (Pazzani et al 96) User defines topic page for each topic
User rates pages (cold or hot)
Syskill & Webert creates profile with Bayesian classifier
accurate
incremental
probabilities can be used for ranking of documents
operates on same data structure as picking informative features
Only top k (=100) “informative” words are used as features
presence or absence of words provides information on classification of pages
word occurs in a higher percentage of hot pages than cold pages
16. 16 Syskill & Webert Rating Pages
17. 17 Syskill & Webert Rating Pages
18. 18 Usage-Based Web Personalization Basic Idea
find aggregate user profiles by automatically discovering user access patterns through Web usage mining (offline process)
match a user’s active session against the discovered profiles to provide dynamic content (online process)
Advantages / Goals
profiles are based on objective information (how users actually traverse the site)
no explicit user ratings or interaction with users (to enter a profile, etc.)
can preserve user privacy (mining from anonymous data)
usage data captures relationships missed by content-based approaches
Applications
provide a customized navigational experience for users based on their interests
targeted electronic advertising / personalized e-coupons / customer support
19. 19 Clustering and User Profiles Collaborative Filtering and Clustering
CF techniques attempt to match a set of user ratings against previous user ratings and find “nearest neighbors”
Clustering can be used to pre-calculate typical user profiles
Transaction clustering:
Pageviews used as features: dimensionality problems arise for large sites
Each cluster contains many transactions; problem is how to “derive” useful aggregate profiles from large transaction clusters
Pageview Clustering
Find overlapping clusters of pageviews directly - clusters serve as aggregate profiles
Can capture overlapping interests of different types of users (even those with potentially dissimilar transactions)
Traditional clustering techniques fail due to very high dimensionality
Related work: “Adaptive Web Sites” by Perkowitz and Etzioni
20. Automatic Web Personalization:Offline Process
21. Automatic Web Personalization:Online Process
22. 22 Real-Time Recommendation Engine Keep track of users’ navigational history through the site
a fixed-size sliding window over the active session to capture the current user’s “short-term” history depth
Match current user’s activity against the discovered profiles
profiles either can be based on aggregate usage profiles, or are obtained directly from association rules or sequential paterns
Dynamically generated recommendations are added to the returned page
each pageview can be assigned a recommendation score based on
matching score to user profiles (e.g., aggregate usage profiles)
“information value” of the pageview based on domain knowledge (e.g., link distance of the candidate recommendation to the active session)
23. 23 Recommendations Based on Association Rules
24. 24 Discovering Aggregate Usage Profiles Characteristics of Aggregate Profiles
the goal is to effectively capture common usage patterns from potentially anonymous click-stream data
profiles are represented as weighted collections of pageviews
weights represent the significance of pageviews within each profile
profiles are overlapping in order to capture common interests among different groups/types of users (e.g., customer segments)
multiple profiles may contribute to the recommendation set for a given user
Example Profiles from the ACR (Assoc. for Consumer Research) Site:
25. 25 Methodologies for the Discovery of Aggregate Profiles Discovery of Profiles Based on Transaction Clusters
cluster user transactions - features are significant pageviews identified in the preprocessing stage
derive usage profiles (set of pageview-weight pairs) based on characteristics of each transaction cluster
Cluster Pageviews
directly compute overlapping clusters of pageviews based on co-occurrence patterns across transactions
features are user transactions, so dimensionality poses a problem for traditional clustering algorithms
we use Association-Rule Hypergraph Partitioning with an overlap factor
26. 26 Input
set of relevant pageviews in preprocessed log
set of user transactions
each transaction is a pageview vector
Transaction Clusters
each cluster contains a set of transaction vectors
for each cluster compute centroid as cluster representative
Aggregate Usage Profiles
a set of pageview-weight pairs: for transaction cluster C, select each pageview pi such that (in the cluster centroid) is greater than a pre-specified threshold Profile Aggregation Based on Clustering Transactions (PACT)
27. 27 Matching score computed using cosine similarity
User’s active session (pageviews in the current window) is compared to each aggregate profile (both are viewed as pageview vectors)
Weight of items in the profile vector is the significance weight of the item for that profile
Weight of items in the session vector can be all 1’s, or based on some method for determining their significance in the current session
Generating recommendations based on matching profiles
from each matching profile recommend the items not already in the user session window, and not directly linked from the pages in the current session window
the recommendation score for an item is based on a combination of profile matching score (similarity to session window) and the weight of the item in that profile
additionally, we can weight items farther away from the current location of user higher (i.e., consider them better recommendations) Recommendations Based on Aggregate Profiles
28. 28 PACT - An Example
29. 29 Recommendations Based on PACT
30. 30 Integrating Content and UsageFor Personalization
31. 31 Integration of Content Profiles Content Profile Representation
content profiles are also represented as overlapping collections of pageview-weight pairs
cluster features over the n-dimensional space of pageviews
for each feature cluster derive a content profile by collecting pageviews in which these features appear as significant
Integration with Recommendation Engine
Usage and content profiles have similar representation, so they can be used by the recommendation engine in the same way
Item weights within profiles must be normalized, so that content and usage profiles can be compared on the same scale
One approach: match active user session with all profiles (both content and usage); then use the maximal recommendation score for candidate recommendations
Another approach: use content profiles for generating recommendations only if no matching usage profiles (with sufficient confidence) is found
32. 32 How Content Profiles Are Generated
33. 33 How Content Profiles Were Generated
34. 34 How Content Profiles Were Generated
35. 35 Comparison of Recommendations(Example Based on ACR Site)
36. 36 Comparison of Recommendations(Example Based on ACR Site)
37. 37 Prediction Accuracy - Precision (Example Based on ACR Site) 18342 transactions, 62 pageview URLs (after filtering)
Data set divided into training and evaluation sets
Portion of each transaction in evaluation set used to generate a recommendation set (based on a given recommendation threshold)
Precision = percentage of recommendations actually visited in the transaction
38. 38 Coverage = percentage of visited pageviews recommended by the personalization engine Prediction Accuracy - Coverage (Example Based on ACR Site)
39. 39 Example - ACR Demo Site(http://aztec.cs.depaul.edu/scripts/acr2)
40. 40 Automatic Web PersonalizationExample - ACR Demo Site
41. 41 Automatic Web PersonalizationExample - ACR Demo Site
42. 42 Automatic Web PersonalizationExample - ACR Demo Site