1 / 15

Topic-Sensitive PageRank

Topic-Sensitive PageRank. Taher H. Haveliwala 2002. Abstract. Target : improving the ranking of search-query results B efore : using the link structure of the Web, to capture the relative importance of Web pages, independent of any particular search query

emarcus
Download Presentation

Topic-Sensitive PageRank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic-Sensitive PageRank Taher H. Haveliwala 2002

  2. Abstract • Target:improvingthe ranking of search-query results • Before:using the link structure of the Web, to capture the relative importance of Web pages, independent of any particular search query • Now:a set of PageRank vectors, biased using a set of topics, to capture more accurately the notion of importance with respect to a particular topic

  3. Abstract : contribution • more accurate rankings than generic PageRank • Compute topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords • Considering searches done in context • Compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared

  4. 1. Introduction • HITS [14] •  a link analysis algorithm • Hubs • Authorities • Include content analyst [4] • Automatically compiling resource lists for general topics [8]

  5. 1. Introduction - PageRank algorithm[7,16] • rank vector - apriori importance -> estimate pages on the Web • Computed once • Offline • independent of the search query(con) • importance scores are used in conjunction with query-specific IR scores to rank the query results

  6. 1. Introduction - Advantage of PageRank • query-time cost of incorporating the precomputed PageRank importance score for a page is low • PageRank is generated using the entire Webgraphrather than a small subset

  7. 1. Introduction - Method in this paper • allows the query to influence the link-based score(HITS) • requires minimal query-time processing (PageRank) • biased with a different topic

  8. 1. Introduction -making PageRank topic-sensitive • avoid the problem of heavily linked pages getting highly ranked for queries(no particular authority) • Hilltop[5] that is designedto improve results for popular queries • Generates a query-specific authority score by detecting and indexing pages that appear to be good experts for certain keywords • experts were not found will not be handled by the Hilltop algorithm.

  9. 1. Introduction -making PageRank topic-sensitive • [17]Propose using a set of Web Pages terms for influencing the computation. • An approach for enhancing searchrankings by generating a PageRank vector for each possiblequery term was recently proposed in [18] with favorable results • requires considerable processing time and storage • not easily extended to makeuse of user and query context

  10. 1. Introduction - two query scenarios • Scenarios1:assume a user with a specific information need issues a query • Determine the topics most closely associated with the query, and use the appropriate topic-sensitive PageRank vectors for ranking the documents satisfying the query.

  11. 1. Introduction - two query scenarios • Scenario2:user is viewing a document (for instance, browsing the Web or reading email), and selects a term from the document for which he would like more information.

  12. Summary of approach • generate 16 topic-sensitive PageRank vectors using URLs from a top-level category from the Open Directory Project (ODP) • At query time, calculate the similarityof the query to each topics • take the linear combination of the topic-sensitive vectors, weighted using the similarities of the query to the topics • link-based computations are performed offline, the query-time costs are not much

  13. 2. Review of PageRank • page u link to page v • Example • Yahoo -> important page(many pages point to it) • pointed to from Yahoo! are probably important • Nu -> outdegree of page u • Rank(p) importance of page p • link (u ,v)confersunits of rank to v

  14. N is thenumber of pages, assign all pages the initial value 1/N • Bv represent the set of pages pointing to v • The final vector • contains the PageRank vector over the Web • computed only once after each crawl of the Web

  15. Interesting, right? This is just a sneak preview of the full presentation. We hope you like it! To see the rest of it, just click here to view it in full on PowerShow.com. Then, if you’d like, you can also log in to PowerShow.com to download the entire presentation for free.

More Related