Tagging with Queries: How and Why? - PowerPoint PPT Presentation

Tagging with queries how and why l.jpg
Download
1 / 35

  • 208 Views
  • Updated On :
  • Presentation posted in: Internet / Web

Tagging with Queries: How and Why?. Ioannis Antonellis antonell@cs.stanford.edu Hector Garcia-Molina hector@cs.stanford.edu Jawed Karim jawed@cs.stanford.edu. Content on the Web. Back Link Text. Search queries. Page Text. Forward Link Text. Cnn Obama Critics news. How?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Tagging with Queries: How and Why?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tagging with queries how and why l.jpg

Tagging with Queries: How and Why?

Ioannis Antonellis

antonell@cs.stanford.edu

Hector Garcia-Molina

hector@cs.stanford.edu

Jawed Karim

jawed@cs.stanford.edu


Slide2 l.jpg

Content on the Web

Back Link Text

Search queries

Page Text

Forward Link Text

CnnObamaCriticsnews

Stanford Infolab


Slide3 l.jpg

How?

  • Basic observation: http referrer field contains search query

Stanford Infolab

3


Slide4 l.jpg

How?

Stanford Infolab


Slide5 l.jpg

How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

Stanford Infolab

5


Web access log l.jpg

Web Access Log

a997c1950718d75c03f22ca8715e50b3 [28/Feb/2007:23:45:47 -0800] /group/svsa/cgi-bin/www/officers.php http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=HPIB,HPIB:2006-47,HPIB:en&q=sexy+random+facts

a64344ffd6638d0f6fb2a0284f98b28b [28/Feb/2007:23:45:49 -0800] /group/King/ "http://www.google.com.au/search?hl=en&q=Martin+Luther+King&meta="

413fa663474b2288c1661882e7e62aea [28/Feb/2007:23:46:02 -0800] /group/pandegroup/folding/results.html "http://www.google.com/search?sourceid=navclient-menuext&ie=UTF-8&q=RESULTS"

3d2edd4dfa7778da92875ee67a319433 [28/Feb/2007:23:46:03 -0800] /group/vpge/sgsi/entrepreneurship/ "http://www.google.com/search?hl=en&q=summer+institute+of+entrepreneurship"

ac49793239a6c490023e460fd4863a48 [28/Feb/2007:23:46:06 -0800] / "http://www.google.com/search?sourceid=navclient&hl=ko&ie=UTF-8&rlz=1T4SUNA_ko___KR209&q=stanford"

1c9893680

Stanford Infolab


Slide7 l.jpg

How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

    2) Embed Javascript code in web pages that capture search queries

Stanford Infolab

7


Embeddable code l.jpg

Embeddable code

Stanford Infolab

8


Slide9 l.jpg

How?

  • Basic observation: http referrer field contains search query

    1) Extract queries from web access log

    2) Embed Javascript code in web pages and capture search queries

  • Convince server administrator/page onwer

Stanford Infolab

9


Slide10 l.jpg

Stanford Infolab

10


Query tags l.jpg

Query tags

Stanford Infolab

11


Information value of query tags l.jpg

Information value of Query Tags

WebBase

  • Datasets:

  • Stanford Query Logs: 360,000 URLs, 900,000 query tags

  • Delicious@Stanford: 3,000 URLs, 5,500 tags

Stanford Infolab

12


Experiments summary l.jpg

Experiments - Summary

  • URLs coverage

  • Query vs Delicious Tags

  • Query/Delicious Tags vs Pagetext

Stanford Infolab


Urls coverage l.jpg

URLs coverage

  • Query logs provide tags for ~110 times more URLs than delicious

  • 13% of delicious URLs (380 URLs) only tagged by delicious

Stanford Infolab

14


Query tags15 l.jpg

Query Tags

  • Query logs provide 42 query tags per URL on average

Stanford Infolab

15


Delicious tags l.jpg

Delicious Tags

  • Delicious provides 3 tags per URL on average

Stanford Infolab

16


Tags for common urls l.jpg

Tags for common URLs

  • Query logs provide 250 query tags per URL on average for common URLs

  • Delicious provides 5 tags per URL on average for common URLs

Stanford Infolab

17


Query tags vs page text l.jpg

Query Tags vs Page Text

  • For every URL, 1 out of 3 query tags are not present in the pagetext

Stanford Infolab

18


Delicious tags vs page text l.jpg

Delicious Tags vs Page Text

  • For every URL, 1 out of 2 query tags are not present in the pagetext

Stanford Infolab

19


Tags for common urls20 l.jpg

Tags for common URLs

  • For common URLs, 1 out of 2 query/delicious tags not present in the pagetext

Stanford Infolab

20


Conclusions l.jpg

Conclusions

Query tags:

Can be extracted in a distributed fashion

new promising source of information

can provide substantially many, new tags, for a large fraction of the Web

Stanford Infolab

21


Slide22 l.jpg

Thank You!

(DEMO)

http://tags.stanford.edu

Stanford Infolab

22


Slide23 l.jpg

Stanford Infolab

23


Slide24 l.jpg

Stanford Infolab

24


Slide25 l.jpg

Stanford Infolab

25


Slide26 l.jpg

Stanford Infolab

26


Slide27 l.jpg

Stanford Infolab

27


Slide28 l.jpg

Stanford Infolab

28


Slide29 l.jpg

Stanford Infolab

29


Slide30 l.jpg

Stanford Infolab

30


Slide31 l.jpg

Stanford Infolab

31


Slide32 l.jpg

Stanford Infolab

32


Slide33 l.jpg

How?

Stanford Infolab

33


Slide34 l.jpg

Stanford Infolab

34


Slide35 l.jpg

Stanford Infolab

35


  • Login