combining link and content information in web search l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Combining Link and Content Information in Web Search PowerPoint Presentation
Download Presentation
Combining Link and Content Information in Web Search

Loading in 2 Seconds...

play fullscreen
1 / 13

Combining Link and Content Information in Web Search - PowerPoint PPT Presentation


  • 316 Views
  • Uploaded on

Combining Link and Content Information in Web Search Fabiana F. Prabhakar Megan Smith Motivation Web search results can be much more improved by considering the documents links structure.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Combining Link and Content Information in Web Search' - benjamin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
motivation
Motivation
  • Web search results can be much more improved by considering the documents links structure.
  • Create an algorithm that can rank the documents based on their links and content combined and that can perform well during query time.
  • Hits: not feasible to compute hubs and authorities during query time;
  • Topic drift: both Hits and PageRank to not take the topic in consideration when ranking the pages.
pagerank
PageRank
  • Web surfer who jumps from page to page, choosing with uniform probability which link to follow at each step;
  • From time to time, the surfer will jump to a random page with a small probability. This also happens whenever a page with no links is reached;
  • Represent the web as a graph: each page is a node and each outlink is an edge in the graph.
directed surfer model
Directed Surfer Model
  • Probabilistically hops from page to page, depending on the content of the pages and the query terms the surfer is looking for.
  • A page rank is calculated for each document term pair in the collection(this calculation is done offline, not during query time).
qd pagerank q j
QD-PageRankq(j)
  • For a single term, the resulting probability distribution over pages is:

QD-PageRankq(j)=P(j)=(1- ) P’q(j) + (i Bj)Pq (i) Pq (ij)

  • Pq (i j) Probability that the surfer will jump from I to j for the query q.
  • P’q(j) specifies where the surfer will choose to jump when not following links. Jumping outside the topic.
some definitions
Some definitions
  • W = set of words in the collection;
  • S = number of unique document-term pairs;
  • N = total number of documents.
r q j relevance of page j to query q
Rq(j) Relevance of page j to query q
  • P’q(j)= Rq(j) / (k  W)Rq(k)
  • Pq (i j)= Rq(j) / (k  Fi)Rq(k)
  • When choosing among outlinks, the directed surfer tends to follow those which lead to pages with relevant content.
multiple term query during retrieval
Multiple-term query (during retrieval)

Q={q1,q2,…,qn}

LOOP{

//select a term that was not selected before

SELECT q from Q according to P(q);

Use QD-PageRankq(j) to calculate QD-PageRankQ(j)*;

}

*QD-PageRankQ(j) = PQ(j)=(q Q)P(q)Pq (j)

scalability
Scalability
  • QD-PageRankq(j) is calculated considering just documents that contain q. The storage requirement is proportional to S (< N).
  • QD-PageRankQ(j) is calculated during query time.
time requirements
Time Requirements
  • Time to compute QD-PageRankq(j) for all q in W = O(S). Experiments have shown that the computation converges in fewer iterations for these smaller sub-graphs, reducing the computational requirements.
  • For most words, the sub-graph will fit in memory, reducing disk I/O during computation.
results
Results
  • Three volunteers were asked to provide a single word and two double world queries.
  • For each query, the top 10 results from standard Page-Rank and QD-PageRank were randomly mixed and given to four volunteers, who were asked to rate each result.
  • None of them knew how the results were obtained.
paper
Paper

Combining Link and Content Information in Web Search

http://www.cs.washington.edu/pedrod/papers/webdyn.pdf

Richardson and Domingos, 2004

(Original conference version: The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank, 2002 - http://citeseer.ist.psu.edu/460350.html)