1 / 22

Wang Hua  王化 情報科学科四年

Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness. Wang Hua  王化 情報科学科四年. Motivation. Too many search engines More than 20 major general-purpose engines More specific-purpose engines Simple aggregation of rankings is popular.

jmary
Download Presentation

Wang Hua  王化 情報科学科四年

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Closeness of Search Engine- Identification of Outliers - Visualization of Closeness WangHua 王化 情報科学科四年

  2. Motivation • Too many search engines • More than 20 major general-purpose engines • More specific-purpose engines • Simple aggregation of rankings is popular. • We address the need to quantify and visualize the closeness between search engines.

  3. Too Many Search Engines with Different Policy • Major search engines • Yahoo, Altavista, Google,Lycos etc. • Distinct ranking policy • Directory type • Robot type • Pagerank type with hyperlink

  4. Outline of Methods • Ranking • List distance measure • Distance between search engines

  5. Ranking • Partial List • Cases for WWW web sites • Top 100 list

  6. List of results from search engines

  7. Footrule Distance among Ranking Lists • s, t:ranking lists • Si |s(i) -t(i)| • [a,b,c,d,e] [a,d,e,c,b] 0+2+1+2+3=8

  8. Kendall-tau Distance • Definition [Dwork, WWW10, 2001] • Counts the number of pairwise disagreements between two lists | { i < j | s(i) < s(j) but t(i) > t(j) } | • [a,b,c,d][a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) 0+0+0+1+1+1=3

  9. Characterof Distance • Kendall-tau has O(n log n)-time complexity • Meets triangle inequality and norm distance

  10. Matrix of Distance • Keyword = “university

  11. Visualization • Kernighan-Lin Algorithm • Kamada Spring Model • Comparison of the 2 methods

  12. Kernighan-Lin Method • Brief explanation

  13. Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”

  14. Kernighan-Lin by Color Coding • Keyword1=“Gucci” Keyword2=“Hermes”

  15. Kamada Spring Model • Brief explanation

  16. An example

  17. Kamada Spring Model • Keyword1=“Totti” Keyword2=“Nakata”

  18. Comparison of the 2 methods

  19. Results • Distances between search engines are different. • Different fields have different characters • Some search engines such as Sprinks are far away from others. • Excite, Aol are near to each other in most cases.

  20. Conclusion • Address the need to quantify and visualize the closeness between search engines. • Provide users GUI to see the closeness of search engines. • Help users to select the proper search engines • Help users to see the features of each search engines in carious fields.

  21. Future Work • Use more search engines • Use both general-purpose and special-purpose search engines • Use hyperlinks to find the resemblance • Apply this idea to other fields

More Related