240 likes | 260 Views
Learn how to mitigate the impact of link plagiarism on search engine rankings using algorithms like HITS. Explore the detection and resolution techniques for link farms and duplicate content. Find out the experiment results and evaluation insights. Contact authors for more information.
E N D
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied Computing 2006
Motivation • Link-based ranking algorithms are important to current popular search engines. (e.g., HITS for Teoma) • Link farms will deteriorate the performance of link-based ranking algorithms
HITS algorithm • Each page has two measures, authority score a shows how good this page is for a query, hub score h shows the possibility that the page points to good authority pages. E is the adjacency matrix. a = ET h h = E a
Example: for query “weather” • http://www.tripadvisor.com/ • http://www.virtualtourist.com/ • http://www.abed.com/memoryfoam.html • http://www.abed.com/furniture.html • http://www.rental-car.us/ • http://www.accommodation-specials.com/ • http://www.lasikeyesurgery.com/ • http://www.lasikeyesurgery.com/lasik-surgery.asp • http://mortgage-rate-refinancing.com/ • http://mortgage-rate-refinancing.com/mortgage-calculator.html
Factors that degrade HITS • Mutually reinforcing relationships • Duplicate pages • Link farms
Complete hyperlink • Definition: • The link with its anchor text as a unit. • Duplication of a complete link is a much stronger sign of copying behavior on the Web than a duplicate link target.
Bipartite Graph • Two disjoint sets X and Y, each edge starts from an element in X and ends with an element in Y.
Link farms • Link farms are usually densely connected via multiple overlapping small bipartite cores. • Task: to detect densely connected bipartite components from “document - complete link” matrix
Experiment: HITS result of “rental car” • http://www.discountcars.net/ • http://www.motel-discounts.com/ • http://www.stlouishoteldeals.com/ • http://www.richmondhoteldeals.com/ • http://www.jacksonvillehoteldeals.com/ • http://www.jacksonhoteldeals.com/ • http://www.keywesthoteldeals.com/ • http://www.austinhoteldeals.com/ • http://www.gatlinburghoteldeals.com/ • http://www.ashevillehoteldeals.com/
Experiment: B&H HITS result of “rental car” • http://www.rentadeal.com/ • http://www.allaboutstlouis.com/ • http://www.allaboutboston.com/ • https://travel2.securesites.com/ about_travelguides/addlisting.html • http://www.allaboutsanfranciscoca.com/ • http://www.allaboutwashingtondc.com/ • http://www.allaboutalbuquerque.com/ • http://www.allabout-losangeles.com/ • http://www.allabout-denver.com/ • http://www.allabout-chicago.com/
Experiment: CL-HITS result of “rental car” • http://www.hertz.com/ • http://www.avis.com/ • http://www.nationalcar.com/ • http://www.thrifty.com/ • http://www.dollar.com/ • http://www.alamo.com/ • http://www.budget.com/ • http://www.enterprise.com/ • http://www.budgetrentacar.com/ • http://www.europcar.com/
Experiment: B&H HITS result of “translation online” • http://www.no-gambling.com/ • http://www.teleorg.org/ • http://ong.altervista.org/ • http://bx.b0x.com/ • http://video-poker.batcave.net/ • http://www.websamba.com/marketing-campaigns • http://online-casino.o-f.com/ • http://caribbean-poker.webxis.com/ • http://roulette.zomi.net/ • http://teleservices.netfirms.com/
Experiment: CL-HITS result of “translation online” • http://www.freetranslation.com/ • http://www.systransoft.com/ • http://babelfish.altavista.com/ • http://www.yourdictionary.com/ • http://dictionaries.travlang.com/ • http://www.google.com/ • http://www.foreignword.com/ • http://www.babylon.com/ • http://www.worldlingo.com/products_services /worldlingo_translator.html • http://www.allwords.com/
Duplicate example: BH-HITS result of “maps” • http://www.maps.com/ • http://www.mapsworldwide.com/ • http://www.cartographic.com/ • http://www.amaps.com/ • http://www.cdmaps.com/ • http://www.ewpnet.com/maps.htm • http://mapsguidesandmore.com/ • http://www.njdiningguide.com/maps.html • http://www.stanfords.co.uk/ • http://www.delorme.com/
Duplicate example: CL-HITS result of “maps” • http://www.maps.com/ • http://maps.yahoo.com/ • http://www.delorme.com/ • http://tiger.census.gov/ • http://www.davidrumsey.com/ • http://memory.loc.gov/ammem/gmdhtml/gmdhome.html • http://www.esri.com/ • http://www.maptech.com/ • http://www.streetmap.co.uk/ • http://www.libs.uga.edu/darchive/hargrett/maps/maps.html
Discussion • Using link alone, the precision at 10 is 66.4%. Much lower than using “complete link”. • Random anchor texts.
Questions? • baw4@cse.lehigh.edu • davison@cse.lehigh.edu