1 / 20

Clustering Web Queries

Clustering Web Queries. John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31. Outline. Introduction Experimental Setup Similarity to Manual Labelings Classification Quality Metric Split Discoveries

alijah
Download Presentation

Clustering Web Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM’09 Speaker: Hsin-Lan, Wang Date: 2010/08/31

  2. Outline • Introduction • Experimental Setup • Similarity to Manual Labelings • Classification Quality Metric • Split Discoveries • Clickthrough Analysis Based on Detected Query Categories • General Web Query Clustering • Concluding Discussion

  3. Introduction • Clustering methods suffer from notable problems, including the evaluation of results. • ground truth labelings • objective functions • Goal: evaluate the quality of clustering results • not require comparison to ground truth • not use a specific clustering algorithm’s objective function

  4. Introduction • Clustering Web Queries: • navigational/informational queries • commercial/non-commercial queries

  5. Experimental setup • Data Set • Weighting Methods • Clustering Algorithms

  6. Data Set • Microsoft adCenter • Includes a record of queries entered, ads displayed and ads clicked. • Personally identifying information was removed. • Commercially-oriented: 1700 queries were selected for which the ad click frequency of the query was above 10.

  7. Data Set • For each query, two types of features available: • search engine result page (SERP) • query-specific features

  8. Weighting Methods

  9. Clustering Algorithms • K-means clustering using Lloyd’s method (kmeans) • Normalized-Cut Spectral clustering (spect) • UPGMA clustering (upgma) • Single Link clustering (slink) • Complete Link clustering (clink) • Document clustering algorithms from Zhao and Karypis: e1, i1, i2, g1, g1p, and h1 objective functions

  10. Similarity to Manual Labelings

  11. Similarity to Manual Labelings

  12. Similarity to Manual Labelings

  13. Classification Quality Metric • Train a classifier to recognize clusters in a clustering. • Classification accuracy (accc): using crossfold validation

  14. Classification Quality Metric • Illustrate a correlation between Na using a linear SVM and internal similarity.

  15. Classification Quality Metric

  16. Split Discoveries

  17. Split Discoveries

  18. Clickthrough Analysis Based on Detected Query Categories • Clustering+SVM • Clickthrough rate: percentage of queries in that set that had an ad click

  19. General Web Query Clustering

  20. Concluding Discussion • Cluster objects using multiple representations and algorithms. • Classification accuracy is used to measure the quality of a clustering. • Future work: extend metric to select the number of clusters

More Related