Create Presentation
Download Presentation

Download Presentation
## Page Rank Modifications & Alternatives

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Page Rank Modifications & Alternatives**Brett Harper**Overview**• Computing Customized Page Ranks • Adaptive Ranking of Web Pages • Generalizing PageRank Damping Functions for Link-Based Ranking Algorithms • An Approach to Confidence Based Page Ranking for User-Oriented Web Search • Web Page Ranking using Link Attributes**Computing Customized Page Ranks**• Page rank usually depends on how related a document is to a query, and the quality of the document. • PageRank introduces document authority. • Similar to the citation problem. • Most proposed web ranking algorithms are based on connectivity rather than content. • For customized ranks, the concept of page importance depends on the situation.**Computing Customized Page Ranks**• Current solutions build different ranks for topics, users, or queries. • Automatic building of the ranking function from a set of user examples.**Computing Customized Page Ranks**• Brin & Page's PageRank • Generalized PageRank, where x is a vector containing ranks, W is an n*n matrix, and e is an n-vector. • Parametric PageRank, where the sum of each of the a's is 1.**Computing Customized Page Ranks**• User requirements are represented as an optimization problem where the variables are the user requirements and the total number of constraints. • The issue of how to obtain constraints is not discussed. • A cost function allows the ranks to be changed in accordance with the requirements. (Quadratic and linear) • Methods for infeasible requirements. • Penalty Function • Number of satisfied constraints, in addition to the cost function.**Computing Customized Page Ranks**• WT10G data set • Constraints defined • Adaptive rank computed • Compared to PageRank on entire WT10G dataset**Adaptive Ranking of Web Pages**• Alter PageRank by modifying the PageRank equation. • Can be done from perspective of the user or web site administrators. • Modify rank by changing (1-d) in the original PageRank. • Dynamic Control • Static Control**Adaptive Ranking of Web Pages**• Rules • B is an r*n matrix, b is a rule vector of size r • Inputs and outputs should be positive • The cost function allows the rank of certain pages to be modified while keeping the current rank of other pages.**Adaptive Ranking of Web Pages**• Initial solution was to structure the problem as a quadratic programming problem. • Second solution uses clusters to reduce the number of dimensions. • Pages are clustered based on score • Vector E contains k parameters. • Vector A is the sum of the columns in (I-dW)^-1 that correspond to a certain class.**Adaptive Ranking of Web Pages**• Vector E contains k parameters. • Vector A is the sum of the columns in M that correspond to a certain class. • H is defined as BA • is the quadratic term • is the linear term**Adaptive Ranking of Web Pages**• Contradicting constraints • Relax constraints to arrive at sub-optimal solution • Add s to the cost function (used to balance importance of contraints and original cost function)**Adaptive Ranking of Web Pages**• Use a clustering algorithm to split webpages into clusters. • Compute Ai • If there is a feasible solution, use the first formula to find the optimal parameters e1,...,ek. • If no feasible solution exists, use the version for relaxed constraints to find sub-optimal parameters e1,...,ek. • Compute rank as**Adaptive Ranking of Web Pages**• Used the WT10G data set for experiments • First experiment: Swap importance of two pages located some distance Δ apart. • Effectively modifies the PageRank • Constraints on highly ranked pages disturbs the rest of the pages more significantly. • These disruptions appear in blocks due to clustering. • When swapping two pages, effect is greater on lower ranked than higher ranked pages. • Quality of results is influenced by # of clusters.**Adaptive Ranking of Web Pages**• Second experiment: Change # of clusters • Gradually increase # of clusters used from 5 to 100. • Cost function stops improving at ~60 clusters. • Clustering can reduce the complexity level of the problem. • # of clusters quite small compared to the size of the collection.**Adaptive Ranking of Web Pages**• Clustering techniques • Cluster by score • Cluster by rank (variable-sized cluster dimensions) • Cluster by rank with fixed size cluster dimensions**Adaptive Ranking of Web Pages**• PageRanks can be modified, but constraints on some pages causes the ranks of all pages to be affected. • The effect of these constraints depends on how highly ranked the constrained page is.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Damping functions reduce page importance propogation on long paths. • Focus on linear, exponential, and hyperbolic decay. • Exponential corresponds to original PageRank.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • For functional rankings, a link matrix is used. • Normalization • Dangling nodes • If P is the resulting matrix after normalization, the rank is defined as**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • An equivalent approach takes into account the branching contribution. • Rank of a node is the weighted sum of incoming paths, with weights that decay exponentially with path length. • PageRank is a functional ranking where the damping function is (1-α)α^t.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Linear Damping**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Hyperbolic Damping**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Empirical Damping • Pages that are linked are similar, but the topic changes as the distance increases. • Use decrease in text similarity as an approximation to an empirical damping function. • .uk domain, 18m pages, 200 pages chosen at random, similarity measured using TF.IDF without stemming or stop-word removal • Results show that this is better approximated by linear damping with L=8 or 9 than by exponential damping.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Approximating Hyperbolic with Exponential Damping • Find the α that minimizes the difference of weights for different values of β and the maximum path length l.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Approximating Exponential with Linear Damping • Find the L that minimizes the difference of weights for different values of α and the maximum path length l.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Parameters for the damping function • Characteristic path length (average distance between two nodes) grows sub-logarithmically with the size of the graph. • For a smaller graph, the damping function should decay faster. • The sum of the weights up to the average path lengths of graphs L1 and L2 have to be similar for both rankings to behave in a similar way.**Generalizing PageRank: Damping Functions for Link-Based**Ranking Algorithms • Experimental Comparison of precision (PageRank vs. LinearRank) • Used the WebTREC Gov2 collection (25m documents, .gov domain, 2004) • Chose 50 queries at random to run. • PageRank took 39 iterations to run. LinearRank was run for 5, 10, and 20 iterations. • After first 5 results, LinearRank had precision similar to PageRank. • Useful when rankings can't be computed in advance.**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • Confidence is the probability of accessing a page for a specific query given past behavior. • Use this probability to enhance page rankings of most relevant pages. • Should also take link structure into account. • Merge pages with similar categories since users lose interest after first few results.**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • Extract important features and categories from web pages. • Prune pages from the graph that are not relevant. • Calculate confidence for all features and categories of each page. • Use citations (link structure) and confidence measure to recursively compute the page rank.**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • Extract important features and categories from web pages. • Search the full-text and extended anchor text for most relevant features/categories. • in the set of features where N(P,i) is the total # of times page P is accessed for query i and O(i) is the total number of queries made for i. • Pages with high E(P,a) will likely be accessed for the topic a.**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • Prune pages from the graph that are not relevant. • Pages without similar features/categories can be connected. • These pages are used for extracting features/ categories, but are pruned if the confidence does not meet a certain threshold. • Citations of pruned pages are also removed.**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • Calculate confidence for all features and categories of each page. • in the customized graph. • Calculating C(a,P) for the entire history is not realistic, so only take recent history into account.**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • Use citations (link structure) and confidence measure to recursively compute the page rank. • PR(P,a) = (1-d) + d[PR(T1,a)/O(T1)+...+ PR(Tn,a)/O(Tn)], where Ti is a citing page and O(Ti) is the # of outgoing links. • RPR(P,a) = PR(P,a) * C(a,P) • New pages cited by many many relevant high-ranked pages. Can be suppressed by including a time period. • Substitute damping factor d with (1-C(a,P))**An Approach to Confidence Based Page Ranking for User**Oriented Web Search • The data set was constructed from a list of 7 queries, from which the top 30 results were obtained from Google. • A graph of these nodes was then created, and further expanded to a depth of 2. This new graph contained 500-800 nodes. • Higher ranked pages are not always accessed a higher number of times. • Pages can be accessed for multiple queries. • Pages with higher confidence tend to be ranked higher.**Web Page Ranking using Link Attributes**• Tries to improve on current ranking techniques by assigning different weights to links. (WLRank) • Relative position in the page • Tag where the link is contained • Length of anchor text**Web Page Ranking using Link Attributes**• L(j,i) is 1 if a link exists or 0 otherwise, and c is a constant that gives a base weight to every link • T(j,i) depends on the tag • AL(j,i) is length of anchor text divided by average anchor text length d. • RP(j,i) is the relative position weighted by constant b. • If W(j,i) = L(j,i) then it is equal to PageRank.**Web Page Ranking using Link Attributes**• Tested against 460k pages in the .CL domain. • Several users provided relevance judgements on the first 10 results of several queries. • Used c=1, b=1, and d=100. • Only used weights for <b> and <h1> tags. • Compare precision based on a perfect ranking for the first 10 answers. • Improvement of 13% on average.**Conclusions**• PageRank can be modified to fit user requirements and specific categories. • Different functions can be used to decay PageRank influence on path lengths. • Can improve PageRank through clustering.**References**• Tsoi, A. C., Hagenbuchner, M., and Scarselli, F. 2006. Computing customized page ranks. ACM Trans. Interet Technol. 6, 4 (Nov. 2006), 381-414. • Tsoi, A. C., Morini, G., Scarselli, F., Hagenbuchner, M., and Maggini, M. 2003. Adaptive ranking of web pages. In Proceedings of the 12th international Conference on World Wide Web (Budapest, Hungary, May 20 - 24, 2003). WWW '03. ACM, New York, NY, 356-365. • Baeza-Yates, R., Boldi, P., and Castillo, C. 2006. Generalizing PageRank: damping functions for link-based ranking algorithms. In Proceedings of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, Washington, USA, August 06 - 11, 2006). SIGIR '06. ACM, New York, NY, 308-315. • Mukhopadhyay, D., Giri, D., and Singh, S. R. 2003. An approach to confidence based page ranking for user oriented Web search. SIGMOD Rec. 32, 2 (Jun. 2003), 28-33. • Baeza-Yates, R. and Davis, E. 2004. Web page ranking using link attributes. In Proceedings of the 13th international World Wide Web Conference on Alternate Track Papers &Amp; Posters (New York, NY, USA, May 19 - 21, 2004). WWW Alt. '04. ACM, New York, NY, 328-329.