1 / 17

Clustering Applications in Web Mining and Web Personalization

Clustering Applications in Web Mining and Web Personalization. Bamshad Mobasher DePaul University. Clustering Application: Web Usage Mining. Discovering Aggregate Usage Profiles

Download Presentation

Clustering Applications in Web Mining and Web Personalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ClusteringApplications in Web Mining and Web Personalization Bamshad Mobasher DePaul University

  2. Clustering Application: Web Usage Mining • Discovering Aggregate Usage Profiles • Goal: to effectively capture “user segments” based on their common usage patterns from potentially anonymous click-stream data • Method: Cluster user transactions to obtain user segments automatically, then represent each cluster by its centroid • Aggregate profiles are obtained from each centroid after sorting by weight and filtering out low-weight items in each centroid • Note that profiles are represented as weighted collections of items (pages, products, etc.) • weights represent the significance of the item within each cluster • profiles are overlapping, so they capture common interests among different groups/types of users (e.g., customer segments)

  3. Profile Aggregation Based on Clustering Transactions (PACT) • Discovery of Profiles Based on Transaction Clusters • cluster user transactions - features are significant items present in the transaction • derive usage profiles (set of item-weight pairs) based on characteristics of each transaction cluster as captured in the cluster centroid • Deriving Usage Profiles from Transaction Clusters • each cluster contains a set of user transactions (vectors) • for each cluster compute centroid as cluster representative • a set of item-weight pairs: for transaction cluster C, select each pageviewpi such that (in the cluster centroid) is greater than a pre-specified threshold

  4. PACT - An Example Original Session/user data Given an active session A  B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile. Result of Clustering PROFILE 0 (Cluster Size = 3) -------------------------------------- 1.00 C.html 1.00 D.html PROFILE 1 (Cluster Size = 4) -------------------------------------- 1.00 B.html 1.00 F.html 0.75 A.html 0.25 C.html PROFILE 2 (Cluster Size = 3) -------------------------------------- 1.00 A.html 1.00 D.html 1.00 E.html 0.33 C.html

  5. Web Usage Mining: clustering example • Transaction Clusters: • Clustering similar user transactions and using centroid of each cluster as an aggregate usage profile (representative for a user segment) Sample cluster centroid from dept. Web site (cluster size =330)

  6. Clustering Application: Discovery of Content Profiles • Content Profiles • Goal: automatically group together documents which partially deal with similar concepts • Method: • identify concepts by clustering features (keywords) based on their common occurrences among documents (can also be done using association discovery or correlation analysis) • cluster centroids represent docs in which features in the cluster appear frequently • Content profiles are derived from centroids after filtering out low-weight docs in each centroid • Note that each content profile is represented as a collections of item-weight pairs (similar to usage profiles) • however, the weight of an item in a profile represents the degree to which features in the corresponding cluster appear in that item.

  7. Content Profiles – An Example PROFILE 0 (Cluster Size = 3) -------------------------------------------------------------------------------------------------------------- 1.00 C.html (web, data, mining) 1.00 D.html (web, data, mining) 0.67 B.html (data, mining) PROFILE 1 (Cluster Size = 4) ------------------------------------------------------------------------------------------------------------- 1.00 B.html (business, intelligence, marketing, ecommerce) 1.00 F.html (business, intelligence, marketing, ecommerce) 0.75 A.html (business, intelligence, marketing) 0.50 C.html (marketing, ecommerce) 0.50 E.html (intelligence, marketing) PROFILE 2 (Cluster Size = 3) ------------------------------------------------------------------------------------------------------------- 1.00 A.html (search, information, retrieval) 1.00 E.html (search, information, retrieval) 0.67 C.html (information, retrieval) 0.67 D.html (information, retireval) Filtering threshold = 0.5

  8. User Segments Based on Content • Essentially combines usage and content profiling techniques discussed earlier • Basic Idea: • for each user/session, extract important features of the selected documents/items • based on the global dictionary create a user-feature matrix • each row is a feature vector representing significant terms associated with documents/items selected by the user in a given session • weight can be determined as before (e.g., using tf.idf measure) • next, cluster users/sessions using features as dimensions • Profile generation: • from the user clusters we can now generate overlapping collections of features based on cluster centroids • the weights associated with features in each profile represents the significance of that feature for the corresponding group of users.

  9. User transaction matrix UT Feature-Document Matrix FP

  10. Content Enhanced Transactions User-Feature Matrix UF Note that: UF = UT x FPT Example: users 4 and 6 are more interested in concepts related to Web information retrieval, while user 3 is more interested in data mining.

  11. Clustering and Collaborative Filtering :: Example - clustering based on ratings Consider the following book ratings data (Scale: 1-5)

  12. Clustering and Collaborative Filtering :: Example - clustering based on ratings • Cluster centroids after k-means clustering with k=4 • In this case, each centroid represented the average rating (in that cluster of users) for each item • The first column shows the centroid of the whole dataset, i.e., the overall item average ratings across all users

  13. Clustering and Collaborative Filtering :: Example - clustering based on ratings This approach provides a model-based (and more scalable) versionof user-based collaborative filtering, compared to k-nearest-neighbor NU1 has highest similarity to cluster 3 centroid. The whole cluster couldbe used as the “neighborhood” for NU1.

  14. Clustering and Collaborative Filtering :: clustering based on ratings: movielens

  15. Clustering on the Social Web :: tag clustering example

  16. Hierarchical Clustering:: example – clustered search results Can drill down within clusters to view sub-topics or to view the relevant subset of results

  17. ClusteringApplications in Web Mining and Web Personalization Bamshad Mobasher DePaul University

More Related