1 / 20

A Statistical Comparison of Tag and Query Logs

A Statistical Comparison of Tag and Query Logs. Mark J. Carman, Robert Gwadera , Fabio Crestani , and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim. Contents. Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion. Introduction.

hedda-mccoy
Download Presentation

A Statistical Comparison of Tag and Query Logs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim

  2. Contents Introduction Building a Dataset Are the Distributions Similar? Investigating Website Content Conclusion

  3. Introduction tags

  4. Introduction • Questions 1. Are queries and tags similar across URLs? 2. Can tag data be used to approximate user queries to a search engine? 3. Can query logs be used to suggest new tags for a particular webpage? 4. For what types of websites is the correlation between the term distributions for queries and tags the highest? 5. Which of the distributions, tags or queries, is most closely related to the content of the clicked websites?

  5. Building a Dataset • AOL query log • Sizable • Recent (2006) • English queries • Available to academic researchers • 657,426 users • A period of 3 months from March to May, 2006 • Delicious tag • Collaborative tagging system • Final dataset: 4145 complete URLs • Google query, stemming, prunning

  6. Are the Distributions Similar? http://www.nytimes.com tags or

  7. Are the Distributions Similar? Kullback-Leibler divergence

  8. Are the Distributions Similar? Vq: query logs Vr: tags • Jensen-Shannon divergence • Symmetric measure • Overlap coefficient

  9. Are the Distributions Similar?

  10. Are the Distributions Similar? Open directory project

  11. Are the Distributions Similar?

  12. Are the Distributions Similar?

  13. Are the Distributions Similar?

  14. Are the Distributions Similar?

  15. Are the Distributions Similar?

  16. Are the Distributions Similar?

  17. Investigating Website Content

  18. Investigating Website Content

  19. Conclusion • Similarity between query term and tag • Vocabularies contain a large amount of overlap • Term frequency distributions are correlated • Similarity is not dependent on the topic area • Queries are more similar to content than to tags • Queries and tags are more similar to one another than to content • Future work • Models for automatically removing noise from the tag and query logs • Techniques for predicting useful tags from query distributions • Techniques for the effective use of tag data to improve different forms of Web search

  20. Thank you

More Related