clustering personalized web search results n.
Skip this Video
Loading SlideShow in 5 Seconds..
Clustering Personalized Web Search Results PowerPoint Presentation
Download Presentation
Clustering Personalized Web Search Results

Loading in 2 Seconds...

  share
play fullscreen
1 / 15
Download Presentation

Clustering Personalized Web Search Results - PowerPoint PPT Presentation

joshua-savage
117 Views
Download Presentation

Clustering Personalized Web Search Results

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng

  2. Introduction • Search engine’s objectives • Rank most relevant search results at top • Effectiveness • PageRank / HITS • Group and present different categories of search results • Global view • Clustering

  3. Clustering Personalized Search Results • Study the clustering problem in the UCAIR framework • Personalized search ranks or reranks the search results based on user implicit feedback • Bring interesting problems • Efficient and effective clustering/presentation • Dynamically update the clustering results based on personalization

  4. Goal • Effective • Cluster user search results into meaningful groups • Present in a clear format • Provide users with main themes of search results • Efficient • Implement efficient clustering algorithms • Dynamic • Dynamically maintain the clustering results based on personalized ranking and reranking

  5. Progress • Implemented two clustering algorithms • K-Medoids • Hierarchical clustering • Presentation • Replace Google ads with clustering results • Present ranked results together with clustering results • Two presentation strategies • Most centrally located document in each cluster • Most frequent terms in each cluster

  6. Partial Results • K-Medoids • Select the most centrally located documents as cluster center • Present the centroid documents as each cluster’s representative • Efficiency not so good • Other processing time: 490+100+1562=2152 ms • Cluster search results time: 2844ms

  7. Partial Results (II) • Hierarchical clustering • Merge similar documents in a pair-wise manner • Use weighted average term vectors to represent cluster center • Present centroid term vectors as a virtual documents (output Top-K terms) • Efficiency better than K-Medoids • Other processing time: 200+110+831= 1141 ms • Cluster search results time: 661ms

  8. Efficiency Analysis • K-Medoids • O(k(n-k)2 ) for each iteration where n is # of documents, k is # of clusters • Need multiple iterations for convergence • Hierarchical clustering • O(n2 ) for each iteration • Need n-k iterations

  9. Lessons Learned • Clustering takes longer time as more search results accumulate (when we click “Next”) • Top-K frequent terms in each cluster sometimes do not make sense • Combine additional information besides term frequency • Re-cluster each time when reranking search results • Incremental update of clustering results is desired!

  10. Remaining • Implementation • KMeans • MMR • Frequent word sets • Effective presentation study • Based on user feedback • Literature survey • Dynamic maintenance of clustering based on search result ranking and reranking • Drill down in a particular cluster • Update overall clustering organization

  11. Feedback • Which way to present clustering results is more meaningful? • Based on central documents • Based on term vectors • More options? • Any other clustering algorithms to achieve effectiveness and efficiency? • Any other presentation strategy besides “rank list + cluster center” ?