1 / 19

Learning to Rank: A Machine Learning Approach to Static Ranking

Learning to Rank: A Machine Learning Approach to Static Ranking. 049011 - Algorithms for Large Data Sets Student Symposium. Presented by Li-Tal Mashiach. Speaker: Li-Tal Mashiach. References . Learning to Rank Using Gradient Descent ICML, 2005, Burges et al

jacob
Download Presentation

Learning to Rank: A Machine Learning Approach to Static Ranking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Rank: A Machine Learning Approach to Static Ranking 049011 - Algorithms for Large Data Sets Student Symposium Presented by Li-Tal Mashiach Speaker: Li-Tal Mashiach

  2. References • Learning to Rank Using Gradient Descent ICML, 2005, Burges et al • Beyond PageRank: Machine Learning for Static Ranking WWW 2006, Brill et al ©Li-Tal Mashiach, Technion, 2006

  3. Today’s topics • Motivation & Introduction • RankNet • fRank • Discussion • Future Work suggestion • Predict Popularity Rank (PP-Rank) ©Li-Tal Mashiach, Technion, 2006

  4. Motivation • The Web is growing exponentially in size • The number of incorrect, spamming, and malicious sites is also growing • Having a good static ranking is crucially important • Recent works showed that PageRank may not perform any better than other simple measure on certain tasks ©Li-Tal Mashiach, Technion, 2006

  5. Motivation – Cont. • Combination of many features is more accurate than one feature • PageRank is only link structure feature • It is harder for malicious users to manipulate the ranking in case of machine learning approach ©Li-Tal Mashiach, Technion, 2006

  6. Introduction • Neural networks • Training • Cost function • Gradient Descent ©Li-Tal Mashiach, Technion, 2006

  7. Neural Networks Like the brain, neural network is a massively parallel collection of small and simple processing units where the interconnections form a large part of the network's intelligence. ©Li-Tal Mashiach, Technion, 2006

  8. Training neural network The task is similar to teaching a student • First, show him some examples • After that, ask him to solve some problems • Finally, correct him, and start the whole process again Hopefully, he’ll get it right after a couple of rounds ©Li-Tal Mashiach, Technion, 2006

  9. Training neural network – cont. • Cost function – Error function to minimize • Sum squared error • Cross entropy • Gradient Descent • take the derivative of the cost function with respect to the network parameters • change those parameters in a gradient-related direction ©Li-Tal Mashiach, Technion, 2006

  10. Static ranking as a Classification problem • xi represents a set of features of a Web page i • yi is a rank • The classification problem - learn the function that maps all pages’ features to their rank • But all we really care about is the order of the pages ©Li-Tal Mashiach, Technion, 2006

  11. RankNet • Optimize the order of objects, rather than the values assigned to them • RankNet is given • Collection of pairs of items Z={<xi,yj>} • Target probabilities that Web page i is to be ranked higher than j • RankNet learns the order of the items • Using probabilistic cost function (cross entropy) for training ©Li-Tal Mashiach, Technion, 2006

  12. fRank • Uses RankNet to learn the static ranking function • Training according to human judgments • For each query, rating is assigned manually to a number of results • The rating measures how relevant the result is for the query ©Li-Tal Mashiach, Technion, 2006

  13. fRank – Cont. • Uses set of features from each page: • PageRank • Popularity – number of visits • Anchor text and inlinks – total amount of text in links, number of unique words, etc. • Page – number of words, frequency of the most common term, etc. • Domain – various averages across all pages in the domain – PageRank, number of outlinks, etc. ©Li-Tal Mashiach, Technion, 2006

  14. fRank Results • fRank performs significantly better than PageRank • Page and Popularity feature sets were the most significant contributors • By collecting more popularity data, fRank performance continues to improve ©Li-Tal Mashiach, Technion, 2006

  15. Discussion • The training for static ranking cannot be depend on queries • Using human judgments for static ranking (?) • PageRank advantages • protecting from spams • fRank is not useful for directing the crawl ©Li-Tal Mashiach, Technion, 2006

  16. Future work – PP-Rank • Training the machine to predict popularity of Web Page • Using popularity data for training • Amount of visits • How long users stay in the page • Did they leave by clicking back • … • should be normalized to the pattern of each user ©Li-Tal Mashiach, Technion, 2006

  17. PP-Rank - Advantages • Can predict popularity of pages that were just created (no page points to them yet) • Can be a measure for directing the crawler • The rank will be not according to what web masters find interesting (PageRank), but according to what users find interesting ©Li-Tal Mashiach, Technion, 2006

  18. Summary • Ranking is the key to search engine • Learning-based approach for static ranking is a promising new field • RankNet • fRank • PP-Rank ©Li-Tal Mashiach, Technion, 2006

  19. ANY QUESTIONS? ©Li-Tal Mashiach, Technion, 2006

More Related