100 likes | 228 Views
Discovery of Aggregate Usage Profiles for Web Personalization. WebKDD 2000. Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire. System Architecture. Data Abstractions. Drafts from W3C Web Characterization Activity(WCA). TERM. DEFINITION.
E N D
Discovery of Aggregate Usage Profiles for Web Personalization WebKDD 2000 Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire
Data Abstractions • Drafts from W3C Web Characterization Activity(WCA) TERM DEFINITION A single individual that is accessing file from one or more Web servers through a browser user Every file that contributes to the display on a user’s browser at one time. It is usually associated with a single user action. pageview clickstream A sequential series of page view requests user session The click-stream of pageviews for a single user across the entire web server session The set of pageviews in a user session for a particular web site episode Any semantically meaningful subset of a user or server session.
A Example B C D E F G H I J K L M N USER1 : A B F O G A D USRE2 : A B C J USRE3 : L R O P Q R S T
Usage Mining • After preprocessing, we will have • A set of n pageview records, P = { p1, p2, … , pn } • A set of m user transactions, T = { t1, t2, … , tm } • Each transaction can be viewed as n-dimensional vector t = <w(p1,t), w(p2,t), … , w(pn,t)> • Goal of Usage Mining • Aggregate Usage profiles representing groups of different user behaviors. • Each item in a usage profile is a URL representing a relevant pageview object, and can have an associated weight representing its significance within the profile.
1 |C| Σ w(p,t) tc Transaction Clustering • Use k-means algorithm to partition this this pageview space into different clusters. • PACT(Profile Aggregations on Clustering Transactions) Given a transaction cluster c, construct a usage profile prc. prc = { <p,weight(p,prc)> | p P, weight(p,prc) } weight(p,prc) =
F average confidence D J O 0.6 L 0.4 E A P R G K M 0.7 H B 0.6 Q N I C Pageview Clustering (1/2) • Use Apriori algorithm to find frequent item sets. • Use (ARHP)Association Rule Hypergraph Partitioning to find aggregate profiles. Hypergraph H = (V,E) V : pageview set E : weighted frequent itemsets
Σe C Weight(e) Σ|e∩ C| Weight(e) Fitness(C) = Pageview Clustering (2/2) | {e| e C, v e}| |{e|e C}| F Connectivity(v) = D J O 0.6 L 0.4 E A P R G K M 0.7 H B 0.6 2 Q N I C 2 1
Σwkcsk match(S,C) = Σ(sk)2 Σ(wkc)2 Recommendation weight(pi,C), if pi C 0, otherwise • Given a usage profile C, we can represent C as a vector C = { w1c, w2C, … ,wnC } Wic = • Given current active session S, S=<s1,s2,…,sn> Rec(S,p) = weight(p,C)match(S,C)