Tagging with queries how and why
Download
1 / 35

Tagging with Queries: How and Why? - PowerPoint PPT Presentation


  • 213 Views
  • Updated On :
  • Presentation posted in: Internet / Web

Tagging with Queries: How and Why?. Ioannis Antonellis antonell@cs.stanford.edu Hector Garcia-Molina hector@cs.stanford.edu Jawed Karim jawed@cs.stanford.edu. Content on the Web. Back Link Text. Search queries. Page Text. Forward Link Text. Cnn Obama Critics news. How?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Tagging with Queries: How and Why?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tagging with Queries: How and Why?

Ioannis Antonellis

antonell@cs.stanford.edu

Hector Garcia-Molina

hector@cs.stanford.edu

Jawed Karim

jawed@cs.stanford.edu


Content on the Web

Back Link Text

Search queries

Page Text

Forward Link Text

CnnObamaCriticsnews

Stanford Infolab


How?

  • Basic observation: http referrer field contains search query

Stanford Infolab

3


How?

Stanford Infolab


How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

Stanford Infolab

5


Web Access Log

a997c1950718d75c03f22ca8715e50b3 [28/Feb/2007:23:45:47 -0800] /group/svsa/cgi-bin/www/officers.php http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=HPIB,HPIB:2006-47,HPIB:en&q=sexy+random+facts

a64344ffd6638d0f6fb2a0284f98b28b [28/Feb/2007:23:45:49 -0800] /group/King/ "http://www.google.com.au/search?hl=en&q=Martin+Luther+King&meta="

413fa663474b2288c1661882e7e62aea [28/Feb/2007:23:46:02 -0800] /group/pandegroup/folding/results.html "http://www.google.com/search?sourceid=navclient-menuext&ie=UTF-8&q=RESULTS"

3d2edd4dfa7778da92875ee67a319433 [28/Feb/2007:23:46:03 -0800] /group/vpge/sgsi/entrepreneurship/ "http://www.google.com/search?hl=en&q=summer+institute+of+entrepreneurship"

ac49793239a6c490023e460fd4863a48 [28/Feb/2007:23:46:06 -0800] / "http://www.google.com/search?sourceid=navclient&hl=ko&ie=UTF-8&rlz=1T4SUNA_ko___KR209&q=stanford"

1c9893680

Stanford Infolab


How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

    2) Embed Javascript code in web pages that capture search queries

Stanford Infolab

7


Embeddable code

Stanford Infolab

8


How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

    2) Embed Javascript code in web pages and capture search queries

  • Convince server administrator/page onwer

Stanford Infolab

9


Stanford Infolab

10


Query tags

Stanford Infolab

11


Information value of Query Tags

WebBase

  • Datasets:

  • Stanford Query Logs: 360,000 URLs, 900,000 query tags

  • Delicious@Stanford: 3,000 URLs, 5,500 tags

Stanford Infolab

12


Experiments - Summary

  • URLs coverage

  • Query vs Delicious Tags

  • Query/Delicious Tags vs Pagetext

Stanford Infolab


URLs coverage

  • Query logs provide tags for ~110 times more URLs than delicious

  • 13% of delicious URLs (380 URLs) only tagged by delicious

Stanford Infolab

14


Query Tags

  • Query logs provide 42 query tags per URL on average

Stanford Infolab

15


Delicious Tags

  • Delicious provides 3 tags per URL on average

Stanford Infolab

16


Tags for common URLs

  • Query logs provide 250 query tags per URL on average for common URLs

  • Delicious provides 5 tags per URL on average for common URLs

Stanford Infolab

17


Query Tags vs Page Text

  • For every URL, 1 out of 3 query tags are not present in the pagetext

Stanford Infolab

18


Delicious Tags vs Page Text

  • For every URL, 1 out of 2 query tags are not present in the pagetext

Stanford Infolab

19


Tags for common URLs

  • For common URLs, 1 out of 2 query/delicious tags not present in the pagetext

Stanford Infolab

20


Conclusions

Query tags:

Can be extracted in a distributed fashion

new promising source of information

can provide substantially many, new tags, for a large fraction of the Web

Stanford Infolab

21


Thank You!

(DEMO)

http://tags.stanford.edu

Stanford Infolab

22


Stanford Infolab

23


Stanford Infolab

24


Stanford Infolab

25


Stanford Infolab

26


Stanford Infolab

27


Stanford Infolab

28


Stanford Infolab

29


Stanford Infolab

30


Stanford Infolab

31


Stanford Infolab

32


How?

Stanford Infolab

33


Stanford Infolab

34


Stanford Infolab

35


ad
  • Login