1 / 24

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings. Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied Computing 2006. Motivation. Link-based ranking algorithms are important to current popular search engines. (e.g., HITS for Teoma)

jantonette
Download Presentation

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied Computing 2006

  2. Motivation • Link-based ranking algorithms are important to current popular search engines. (e.g., HITS for Teoma) • Link farms will deteriorate the performance of link-based ranking algorithms

  3. HITS algorithm • Each page has two measures, authority score a shows how good this page is for a query, hub score h shows the possibility that the page points to good authority pages. E is the adjacency matrix. a = ET h h = E a

  4. Example: for query “weather” • http://www.tripadvisor.com/ • http://www.virtualtourist.com/ • http://www.abed.com/memoryfoam.html • http://www.abed.com/furniture.html • http://www.rental-car.us/ • http://www.accommodation-specials.com/ • http://www.lasikeyesurgery.com/ • http://www.lasikeyesurgery.com/lasik-surgery.asp • http://mortgage-rate-refinancing.com/ • http://mortgage-rate-refinancing.com/mortgage-calculator.html

  5. Factors that degrade HITS • Mutually reinforcing relationships • Duplicate pages • Link farms

  6. Complete hyperlink • Definition: • The link with its anchor text as a unit. • Duplication of a complete link is a much stronger sign of copying behavior on the Web than a duplicate link target.

  7. Document - Complete link Matrix

  8. Bipartite Graph • Two disjoint sets X and Y, each edge starts from an element in X and ends with an element in Y.

  9. Link farms • Link farms are usually densely connected via multiple overlapping small bipartite cores. • Task: to detect densely connected bipartite components from “document - complete link” matrix

  10. Algorithm for finding bipartite components

  11. Result: k=2 and l=2

  12. Adjustment: document-document matrix

  13. Final matrix

  14. Weighted adjacency matrix

  15. Experiment: HITS result of “rental car” • http://www.discountcars.net/ • http://www.motel-discounts.com/ • http://www.stlouishoteldeals.com/ • http://www.richmondhoteldeals.com/ • http://www.jacksonvillehoteldeals.com/ • http://www.jacksonhoteldeals.com/ • http://www.keywesthoteldeals.com/ • http://www.austinhoteldeals.com/ • http://www.gatlinburghoteldeals.com/ • http://www.ashevillehoteldeals.com/

  16. Experiment: B&H HITS result of “rental car” • http://www.rentadeal.com/ • http://www.allaboutstlouis.com/ • http://www.allaboutboston.com/ • https://travel2.securesites.com/ about_travelguides/addlisting.html • http://www.allaboutsanfranciscoca.com/ • http://www.allaboutwashingtondc.com/ • http://www.allaboutalbuquerque.com/ • http://www.allabout-losangeles.com/ • http://www.allabout-denver.com/ • http://www.allabout-chicago.com/

  17. Experiment: CL-HITS result of “rental car” • http://www.hertz.com/ • http://www.avis.com/ • http://www.nationalcar.com/ • http://www.thrifty.com/ • http://www.dollar.com/ • http://www.alamo.com/ • http://www.budget.com/ • http://www.enterprise.com/ • http://www.budgetrentacar.com/ • http://www.europcar.com/

  18. Experiment: B&H HITS result of “translation online” • http://www.no-gambling.com/ • http://www.teleorg.org/ • http://ong.altervista.org/ • http://bx.b0x.com/ • http://video-poker.batcave.net/ • http://www.websamba.com/marketing-campaigns • http://online-casino.o-f.com/ • http://caribbean-poker.webxis.com/ • http://roulette.zomi.net/ • http://teleservices.netfirms.com/

  19. Experiment: CL-HITS result of “translation online” • http://www.freetranslation.com/ • http://www.systransoft.com/ • http://babelfish.altavista.com/ • http://www.yourdictionary.com/ • http://dictionaries.travlang.com/ • http://www.google.com/ • http://www.foreignword.com/ • http://www.babylon.com/ • http://www.worldlingo.com/products_services /worldlingo_translator.html • http://www.allwords.com/

  20. Duplicate example: BH-HITS result of “maps” • http://www.maps.com/ • http://www.mapsworldwide.com/ • http://www.cartographic.com/ • http://www.amaps.com/ • http://www.cdmaps.com/ • http://www.ewpnet.com/maps.htm • http://mapsguidesandmore.com/ • http://www.njdiningguide.com/maps.html • http://www.stanfords.co.uk/ • http://www.delorme.com/

  21. Duplicate example: CL-HITS result of “maps” • http://www.maps.com/ • http://maps.yahoo.com/ • http://www.delorme.com/ • http://tiger.census.gov/ • http://www.davidrumsey.com/ • http://memory.loc.gov/ammem/gmdhtml/gmdhome.html • http://www.esri.com/ • http://www.maptech.com/ • http://www.streetmap.co.uk/ • http://www.libs.uga.edu/darchive/hargrett/maps/maps.html

  22. User evaluation

  23. Discussion • Using link alone, the precision at 10 is 66.4%. Much lower than using “complete link”. • Random anchor texts.

  24. Questions? • baw4@cse.lehigh.edu • davison@cse.lehigh.edu

More Related