Tagging with queries how and why
Download
1 / 35

tagging with queries: how and why - PowerPoint PPT Presentation


  • 219 Views
  • Updated On :

Tagging with Queries: How and Why?. Ioannis Antonellis [email protected] Hector Garcia-Molina [email protected] Jawed Karim [email protected] Content on the Web. Back Link Text. Search queries. Page Text. Forward Link Text. Cnn Obama Critics news. How?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'tagging with queries: how and why' - Pat_Xavi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Slide2 l.jpg

Content on the Web

Back Link Text

Search queries

Page Text

Forward Link Text

Cnn ObamaCriticsnews

Stanford Infolab


Slide3 l.jpg
How?

  • Basic observation: http referrer field contains search query

Stanford Infolab

3


Slide4 l.jpg
How?

Stanford Infolab


Slide5 l.jpg
How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

Stanford Infolab

5


Web access log l.jpg
Web Access Log

a997c1950718d75c03f22ca8715e50b3 [28/Feb/2007:23:45:47 -0800] /group/svsa/cgi-bin/www/officers.php http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=HPIB,HPIB:2006-47,HPIB:en&q=sexy+random+facts

a64344ffd6638d0f6fb2a0284f98b28b [28/Feb/2007:23:45:49 -0800] /group/King/ "http://www.google.com.au/search?hl=en&q=Martin+Luther+King&meta="

413fa663474b2288c1661882e7e62aea [28/Feb/2007:23:46:02 -0800] /group/pandegroup/folding/results.html "http://www.google.com/search?sourceid=navclient-menuext&ie=UTF-8&q=RESULTS"

3d2edd4dfa7778da92875ee67a319433 [28/Feb/2007:23:46:03 -0800] /group/vpge/sgsi/entrepreneurship/ "http://www.google.com/search?hl=en&q=summer+institute+of+entrepreneurship"

ac49793239a6c490023e460fd4863a48 [28/Feb/2007:23:46:06 -0800] / "http://www.google.com/search?sourceid=navclient&hl=ko&ie=UTF-8&rlz=1T4SUNA_ko___KR209&q=stanford"

1c9893680

Stanford Infolab


Slide7 l.jpg
How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

    2) Embed Javascript code in web pages that capture search queries

Stanford Infolab

7


Embeddable code l.jpg
Embeddable code

Stanford Infolab

8


Slide9 l.jpg
How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

    2) Embed Javascript code in web pages and capture search queries

  • Convince server administrator/page onwer

Stanford Infolab

9



Query tags l.jpg
Query tags

Stanford Infolab

11


Information value of query tags l.jpg
Information value of Query Tags

WebBase

  • Datasets:

  • Stanford Query Logs: 360,000 URLs, 900,000 query tags

  • [email protected]: 3,000 URLs, 5,500 tags

Stanford Infolab

12


Experiments summary l.jpg
Experiments - Summary

  • URLs coverage

  • Query vs Delicious Tags

  • Query/Delicious Tags vs Pagetext

Stanford Infolab


Urls coverage l.jpg
URLs coverage

  • Query logs provide tags for ~110 times more URLs than delicious

  • 13% of delicious URLs (380 URLs) only tagged by delicious

Stanford Infolab

14


Query tags15 l.jpg
Query Tags

  • Query logs provide 42 query tags per URL on average

Stanford Infolab

15


Delicious tags l.jpg
Delicious Tags

  • Delicious provides 3 tags per URL on average

Stanford Infolab

16


Tags for common urls l.jpg
Tags for common URLs

  • Query logs provide 250 query tags per URL on average for common URLs

  • Delicious provides 5 tags per URL on average for common URLs

Stanford Infolab

17


Query tags vs page text l.jpg
Query Tags vs Page Text

  • For every URL, 1 out of 3 query tags are not present in the pagetext

Stanford Infolab

18


Delicious tags vs page text l.jpg
Delicious Tags vs Page Text

  • For every URL, 1 out of 2 query tags are not present in the pagetext

Stanford Infolab

19


Tags for common urls20 l.jpg
Tags for common URLs

  • For common URLs, 1 out of 2 query/delicious tags not present in the pagetext

Stanford Infolab

20


Conclusions l.jpg
Conclusions

Query tags:

Can be extracted in a distributed fashion

new promising source of information

can provide substantially many, new tags, for a large fraction of the Web

Stanford Infolab

21


Slide22 l.jpg

Thank You!

(DEMO)

http://tags.stanford.edu

Stanford Infolab

22












Slide33 l.jpg

How?

Stanford Infolab

33




ad