Measures and Visualization of Search Engine Closeness Study

Measuring Closeness of Search Engine- Identification of Outliers - Visualization of Closeness WangHua　王化情報科学科四年

Motivation • Too many search engines • More than 20 major general-purpose engines • More specific-purpose engines • Simple aggregation of rankings is popular. • We address the need to quantify and visualize the closeness between search engines.

Too Many Search Engines with Different Policy • Major search engines • Yahoo, Altavista, Google,Lycos etc. • Distinct ranking policy • Directory type • Robot type • Pagerank type with hyperlink

Outline of Methods • Ranking • Liｓｔ　ｄistance measure • Distance between search engines

Ranking • Partial List • Cases for WWW web sites • Top 100 list

List of results from search engines

Footrule Distance among Ranking Lists • s, t:ranking lists • Si |s(i) -t(i)| • [a,b,c,d,e] [a,d,e,c,b] 0+2+1+2+3＝８

Kendall-tau Distance • Definition [Dwork, WWW10, 2001] • Counts the number of pairwise disagreements between two lists | { i < j | s(i) < s(j) but t(i) > t(j) } | • [a,b,c,d][a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) 0+0+0+1+1+1=3

Characterof Distance • Kendall-tau has O(n log n)-time complexity • Meets triangle inequality and norm distance

Matrix of Distance • Keyword = “university

Visualization • Kernighan-Lin Algorithm • Kamada Spring Model • Comparison of the 2 methods

Kernighan-Lin Method • Brief explanation

Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”

Kernighan-Lin by Color Coding • Keyword1=“Gucci” Keyword2=“Hermes”

Kamada Spring Model • Brief explanation

An example

Kamada Spring Model • Keyword1=“Totti” Keyword2=“Nakata”

Comparison of the 2 methods

Results • Distances between search engines are different. • Different fields have different characters • Some search engines such as Sprinks are far away from others. • Excite, Aol are near to each other in most cases.

Conclusion • Address the need to quantify and visualize the closeness between search engines. • Provide users GUI to see the closeness of search engines. • Help users to select the proper search engines • Help users to see the features of each search engines in carious fields.

Future Work • Use more search engines • Use both general-purpose and special-purpose search engines • Use hyperlinks to find the resemblance • Apply this idea to other fields

Measures and Visualization of Search Engine Closeness Study