1 / 31

Personalization Services in CADAL

Personalization Services in CADAL. Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006. Outline. Introduction The Architecture of Personalization Services Personalized Search Recommendation based on the Information Filtering techniques

rkilgore
Download Presentation

Personalization Services in CADAL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006

  2. Outline • Introduction • The Architecture of Personalization Services • Personalized Search • Recommendation based on the Information Filtering techniques • Future plan

  3. Outline • Introduction • The Architecture of Personalization Services • Personalized Search • Recommendation based on the Information Filtering techniques • Future plan

  4. Background • The number of digital books meeting with OEB standard is 1,023,425. • It’s a time consuming process finding the useful information and knowledge in this large digital collection of CADAL. • Personalization service is provided to help users to quickly locate their interested things in the collection of CADAL.

  5. Outline • Introduction • The Architecture of Personalization Services • Personalized Search • Recommendation based on the Information Filtering techniques • Future plan

  6. Personal Agent Services Personalized Search Services User Metadata Link Generation Services Query Service Modification Service Recommendation Services Personal Portal Users Repository Services Repositories Repository B Repository A Metadata Metadata Repository C Metadata

  7. Outline • Introduction • The Architecture of Personalization Services • Personalized Search • Recommendation based on the Information Filtering techniques • Future plan

  8. Query Expansion • Many users often send one or two keywords as a query • The search results can be improved by expanding the query with additional search keywords. • Query Expansion depends on the NLP (Natural Language Processing)techniques and relevance feedback methods

  9. Keyword Expansion – The Trigger pairs model • If a word S is significantly correlated to another word T, then (S,T) is considered as a trigger pair, with S the trigger, T the trigged word. • When we see the S in the document, we expect T to appear after S with some confidence.

  10. Trigger pairs selection algorithm(1) • We define that the keywords are , and the expected number of refinement words is . Initialize , is the empty set. 1. is the trigger set to . are sorted in decreasing order of the mutual information. is the trigger set to is the trigger set to

  11. Trigger pairs selection algorithm(2) 2. , and is one of the combinations of n sets out of m. The words in the S are sorted in decreasing order of mutual information. 3. If , let the top N words in S be the refinement words and stop. 4. Otherwise, let , continue step 2.

  12. Outline • Introduction • The Architecture of Personalization Services • Personalized Search • Recommendation based on the Information Filtering techniques • Future plan

  13. Implemented Information filtering techniques • A Content-based filtering method • A Collaborative filtering method

  14. LR_Rocchio algorithm • The user profile is represented as a vector of indicative words extracted from the contents of all digitized books. • The LR_Rocchio algorithm set a bayesian prior of the Logistic Regression model parameter using the user profile calculated by Rocchio algorithm.

  15. Increasing Rocchio algorithm • A widely used user profile updating algorithm is the increasing Rocchio algorithm, which can be generalized as : • Where is the initial profile vector, is the new profile vector, is the set of relevant documents, and is the set of irrelevant documents.

  16. Logistic regression • Logistic regression is one widely used statistical algorithm that can provide an estimation of posterior probability of an unobserved variable given an observed variable . is the dimensional logistical regression model parameter learned from the training data.

  17. LR prior(1) • The Bayesian-based learning algorithms often begin with a certain prior belief about the distribution of the logistic regression model parameter. • Gaussian distribution • A classifier learned with a non-informative prior usually over fits the training data.

  18. LR prior(2) • A prior that encodes Rocchio’s suggestion about decision boundary can be learned via constrained maximum likelihood estimation: Under the constraint:

  19. The Approaches of Collaborative filtering • Memory-based • Pearson Correlation Coefficients • Model-based • Clustering • Aspect model • Hybrid

  20. A hybrid approach using the cluster-based smoothing • Create the user clusters C using the k-means method. • Given the user , and rated items, an item and an integer , the number of nearest neighbors. Choose users into from groups that are most similar to user . • Calculate similarity for each in in which the rating of the user is the combination of and . • Select the top-K most similar users as neighbors. • Predict the rating of the item for by the behaviors of the K nearest neighbors.

  21. Symbol definition • be a set of items • be a set of users • Each triple indicates the item is rated as by the user. • denotes the rating of item by user • denotes his average rating. • the clustering results of the users are represented as • user for whom recommender service

  22. similarity measure function • the Pearson correlation-coefficient function is taken as the similarity measure function. • The similarity between user and user is defined as :

  23. Reducing Data Sparsity • At the early stage of system running, the collected rating data is sparse. To fill the missing values in data set, clusters are explicitly exploited to smooth the sparse data. • Where is the user set in user cluster that have rated item t. is the number of users in cluster who have rated the item t

  24. Increasing System Scalability • make use of the user cluster in neighbor selection to increase system scalability. • The centroid of cluster is represented as the average rating over the cluster. The similarity between the cluster and user is defined as: • After calculating the similarity, the users in the most similar cluster are taken as the candidates that need to be recalculated similarity with the active user on the smoothed data.

  25. Weighting • The different weights are placed on the original data and smoothing data when calculating the similarity between the cluster users and the active user. Where is the tuning parameter between original rating and group rating, its value varied from 0 to 1.

  26. Reformed similarity measure function • The system will select the top K most similar users based on the following similarity function:

  27. Prediction for the active user • After the neighbor selection, a weighted aggregate of the deviations from the neighbor’s mean is used to generate the prediction for the active user as the following:

  28. Modify the user’s information; Set the rule; The complete list of the user’s collections My bookshelf:the books user has collected 收藏的图书可以在用户登录的首页上找到,如下图:

  29. Outline • Introduction • The Architecture of Personalization Services • Personalized Search • Recommendation based on the Information Filtering techniques • Future plan

  30. Future Plan • Extend the architecture of personalization services to incorporate the semantic web techniques. • Put more effort on the web usage mining techniques to discover the user pattern from the web data.

  31. Thanks! Email: wujq@cs.zju.edu.cn

More Related