1 / 16

Web Information retrieval (Web IR)

Web Information retrieval (Web IR). Handout #12: Combinational Ranking. Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir. Ranking Algorithm Problems. Rich-get- richer (Connectivity based) Low precision (at most 0.30)

birch
Download Presentation

Web Information retrieval (Web IR)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Information retrieval (Web IR) Handout #12: Combinational Ranking Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir

  2. Ranking Algorithm Problems • Rich-get- richer (Connectivity based) • Low precision (at most 0.30) • Each ranking algorithm operates well in some situations

  3. Combinational Ranking • Content + connectivity +??? • How can we combine these features? • R=f( query, content, connectivity)

  4. Relevance propagation Model (by Shakery) • A hyper score (h) is computed for each document. • WI and WO are weighting functions for in-link and out-link pages, respectively. • S (p) is similarity between query q and page p(self relevance):

  5. Three Iterative Models • Weighted In-Link • Weighted Out-Link • Uniform Out-Link

  6. Weighted In-Link • This model of user behavior is quite similar to Random surfer, except that it is not query-independent. The probability that the random surfer visits a page is its hyper-relevance score.

  7. Weighted Out-Link • In this model, we assume that given a page to a user, he reads the content of the page with probability alpha and he traverses the outgoing edges with probability (1-alpha). The pages that are linked from a page do not have the same impact on its weight.

  8. Uniform Out-Link • In this special case, they assume that at each page, the user reads the content of the page, and with probability (1-alpha) he reads all the pages that are linked from the page.

  9. Algorithm Implementation • Algorithm is run on a working set • Working set construction: • They first find the top 100000 pages which have the highest content similarity to the query • From these 100000 pages, a small number (about 200) of the most similar pages are selected to be the core set of pages. • They then expand the core set to the working set by adding the pages that are among the 100000 pages and which point to the pages in the core set or are pointed to by the pages in the core set

  10. Algorithm Properties • It is • Online?? • Recursive • Query independent • It is shown on TREC Weighted In-Link outperforms others

  11. Frequency Propagation (By Song) • Instead of Propagation of score, frequency of query terms are propagated • We can use it online • It is used based on site structure

  12. Propagation Formula • ft(p) is the frequency of tem t in page p • f’t(p) is the frequency of tem t in page p after propagation

  13. Overall Framework for propagation • SS is the best • ST & HT-WI are similar

  14. Combinational Ranking Algorithms Based on learning (Learning to Rank)

  15. Training Set q1:{(x11,4),(x12,3),…(x1m,0)} q2:{(x21,3),(x22,2),…(x2m,1)} …. qn:{(xn1,4),(xn2,3),…(xnm,2)} Learning System Ranking Model g(x,w) Labels (Relevance judgments or click orders) (x1,g(x1,w)) (x2,g(x2,w)) (x3,g(x2,w)) … Ranking System (x1,?), (x2,?),… Test Set Combination Framework

  16. Three learning categories • Point wise • Pair wise • List wise

More Related