1 / 25

L EHIGH

L EHIGH. U NIVERSITY. Introduction: Web Search. Google Yahoo! MSN Search Ask A9 Exalead Gigablast + metasearch + many more!. Web search – the access to the Web for hundreds of millions of people Hundreds of millions of queries per day Queries + people = TRAFFIC

keenan
Download Presentation

L EHIGH

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LEHIGH UNIVERSITY Models of Trust for the Web (MTW) WWW2006 Workshop

  2. Introduction: Web Search Google Yahoo! MSN Search Ask A9 Exalead Gigablast + metasearch + many more! • Web search – the access to the Web for hundreds of millions of people • Hundreds of millions of queries per day • Queries + people = TRAFFIC • A HUGE incentive for web site owners to rank highly in search engine results • Communicate some message (advertising, political statement) • Install viruses, adware, etc. Models of Trust for the Web (MTW) WWW2006 Workshop

  3. Introduction: Web Spam • a.k.a. search engine spam, spamdexing • Any technique to manipulate search engine results • Target page gets an undeservedly higher ranking • Many methods • Link farms, keyword stuffing, cloaking, link bombs, and more • The target of much of our work! Models of Trust for the Web (MTW) WWW2006 Workshop

  4. Propagating Trust and Distrust to Demote Web Spam Baoning Wu, Vinay Goel, and Brian D. Davison Computer Science & Engineering Lehigh University Bethlehem, PA USA

  5. Outline • Background and motivation • Proposed methods • Experimental results Models of Trust for the Web (MTW) WWW2006 Workshop

  6. Background: PageRank • (Page and Brin, 1998) • Uses number and status of “parents” to determine status of child • r(i+1) = (1-α) * T * r(i) + α * s • r: PageRank score vector (with N nodes) • T: transition matrix (NxN) • (1-α): decay factor; α: jump probability • s: uniform distribution of 1/N • PageRank score generates a ranking of importance of node Models of Trust for the Web (MTW) WWW2006 Workshop

  7. Background: TrustRank • (Gyongyi and Garcia-Molina, VLDB 2004) • Uses number and trust of “parents” to determine trust status of child • t(i+1) = (1-α) * T * t(i) + α * s • t: TrustRank score vector (with N nodes) • T: transition matrix (NxN) • (1-α): decay factor • s: seed set trust score distribution • Vector of size N, but only seed nodes are non-zero • Demotes web spam by propagating trust from a known good seed set. Models of Trust for the Web (MTW) WWW2006 Workshop

  8. Specific Motivation • In TrustRank • Parent divides its trust among its children. • This may not be optimal – real-world trust relationships are independent of the number of trusted entities. • Distrust can also be propagated. Trust Propagation A B Hyperlink Distrust Propagation Models of Trust for the Web (MTW) WWW2006 Workshop

  9. Key steps in propagation • Decay of trust (d) • Trust is not perfectly transitive. • Splitting of trust • For each parent, how to divide its score among its children. • Accumulation of trust • For each child, how to accumulate the overall score given the portions from all of its parents. Models of Trust for the Web (MTW) WWW2006 Workshop

  10. Outline • Background and motivation • Proposed methods • Experimental results Models of Trust for the Web (MTW) WWW2006 Workshop

  11. Choices for Trust Splitting • Given a node i with trust score TR(i) and O(i) outgoing links: • Equal splitting • Gives d*TR(i)/O(i) to each child (used by TrustRank) • Constant splitting • Gives d*TR(i) to each child • Logarithmic splitting • Gives d*TR(i)/log(1+O(i)) to each child Models of Trust for the Web (MTW) WWW2006 Workshop

  12. Choices for Trust Accumulation • Simple summation • Sum the trust values from each parent • Maximum share • Use the maximum of the trust values sent by the parents • Maximum parent • Sum the trust values but never exceed the trust score of most-trusted parent Models of Trust for the Web (MTW) WWW2006 Workshop

  13. Propagating Distrust • Distrust can be propagated from a seed set of bad nodes. • Similar to trust propagation, but in reverse – follow incoming links, not outgoing links • Same key choices for decay, splitting and accumulation Models of Trust for the Web (MTW) WWW2006 Workshop

  14. Combining Trust and Distrust • For each node i, Trust score TR(i) and Distrust score DIS_TR(i), the combination score Total(i) can be Total(i) = ŋ * TR(i) – ß * DIS_TR(i)where 0 ≤ ŋ ≤ 1, 0 ≤ ß ≤ 1 Models of Trust for the Web (MTW) WWW2006 Workshop

  15. Outline • Background and motivation • Proposed methods • Experimental results Models of Trust for the Web (MTW) WWW2006 Workshop

  16. Data set • 20M pages from the Swiss search engine [search.ch] in 2004 • 350K sites with “.ch” domain • We used only this site graph • Seed sets • 3,589 labeled sites as using web spam with various techniques (provided) • 20,005 sites with pages in dir.search.ch topics as trusted set Models of Trust for the Web (MTW) WWW2006 Workshop

  17. Experimental Design • Explore various combinations of trust and distrust propagation • Evaluation • Performance of TrustRank is the number of spam sites found among the highest-ranked ~1% of sites. • We use the same metric in this work. Models of Trust for the Web (MTW) WWW2006 Workshop

  18. Baseline result Models of Trust for the Web (MTW) WWW2006 Workshop

  19. Simple TrustRank Improvement: Increase jump probability (α) default α=0.15 (α) Models of Trust for the Web (MTW) WWW2006 Workshop

  20. Other trust propagation methods Models of Trust for the Web (MTW) WWW2006 Workshop

  21. Results of propagating distrustCombined equally with TrustRank, 200 seeds Models of Trust for the Web (MTW) WWW2006 Workshop

  22. Combining trust and distrustUsing best scoring trust and distrust formulations, beta=(1-eta) >2200 (Distrust Only) (Trust Only) Models of Trust for the Web (MTW) WWW2006 Workshop

  23. Coverage of trust propagation Percentage of sites affected by approach. TrustRank reached 76.05%. Models of Trust for the Web (MTW) WWW2006 Workshop

  24. Conclusions • Propagating trust based on outdegree does not appear to be optimal. • Alternative splitting and accumulation methods can help to demote top ranked spam sites. • Propagating distrust can also help to demote top ranked spam sites. • Additional tests needed! • E.g., to examine impact on retrieval Models of Trust for the Web (MTW) WWW2006 Workshop

  25. Thank You! Questions? Contact Info: Dr. Brian D. Davison davison(at)cse.lehigh.edu WUME Laboratory Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA The WUME Lab http://wume.cse.lehigh.edu/ Models of Trust for the Web (MTW) WWW2006 Workshop

More Related