1 / 32

Web Search Engines

Web Search Engines. by Greg R. Notess notess@imt.net imt.net/~notess/search. Overview:. Comparing the database content Change Comparative Size Overlap Looking towards future developments Portal or Destination Output sorting. Results are limited by. Database content

Download Presentation

Web Search Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Search Engines by Greg R. Notess notess@imt.net imt.net/~notess/search

  2. Overview: • Comparing the database content • Change • Comparative Size • Overlap • Looking towards future developments • Portal or Destination • Output sorting

  3. Results are limited by • Database content • The Web sites included • The depth to which they are indexed

  4. If it’s not in the database, the best search engine will not be able to find the Web page

  5. So what’re they like? • Very large databases • Most index all words on page • None index words in images • Let’s see how the databases compare to the real Web

  6. Change over time?

  7. Overall Size Change Is the Web in general • Growing? • Shrinking? • Remaining the same?

  8. Excite 6 Searches 10/96-8/98

  9. What about the rest? • Who’s the biggest? • How to measure? • Actual search results • Verified hits

  10. And over time? • 8/98 -- AltaVista, Northern Light, HotBot • 5/98 -- AltaVista, HotBot, Northern Light • 2/98 -- HotBot, AltaVista, Northern Light • 10/97 -- AltaVista, HotBot, Northern Light • 9/97 -- Northern Light, Excite, HotBot • 6/97 -- HotBot, AltaVista, Infoseek • 10/96 -- HotBot, Excite, AltaVista

  11. Back to change in size • Let’s look at six search engines • Over the course of two years

  12. But at least • They have a high degree of duplication between them • Right?

  13. Try 4 small searches • Using five search engines • How many pages are found by all five or at least by four of them?

  14. ZERO

  15. Overlap

  16. And they exclude most: • Content of Adobe PDF and formatted files • The content in most sites requiring a log in • CGI output: data requested by a form • Other dynamically produced data • Pages protected by a robots.txt file • Intranets, pages not linked from anywhere else • Commercial resources with domain limitations • Non-Web resources

  17. Scope Summary: • Inconsistent growth • Not full coverage • Surprisingly low duplication

  18. Positive Side? • Essential for searching the Net • Can be used effectively • Phrase search • Use more than one • Smart searching

  19. Incredibly popular • Even when they fail • But then, since when is finding information always easy?

  20. Overview: • Comparing the database content • Change • Comparative Size • Overlap • Looking towards future developments • Portal or Destination • Output sorting

  21. What is a search engine? • Portal? • Gateway? • Destination?

  22. Search Engine • the software than searches a database

  23. Development • Database of Web pages • adds Supplementary Database • Phone numbers, reference, businesses, news • then adds Subject directory • then Services • email, ISP, shopping, travel agent • now Communities

  24. Portal to Destination? • Driving force • advertising revenue • Keep users longer for more • Conflicts with portal and gateway principle

  25. Future possibilities? • Smaller databases • Less pointing to external pages • Paid advertising or sponsorship for visibility • Rise of search only sites?

  26. Output Development • Initially, “Relevance” ranking • Crude • Not site or URL based • Some site sorting from Excite • No date sorting

  27. Site Sorting • Infoseek, then Lycos, now HotBot • Group together by site • More relevant than prior algorithms • Northern Light includes it in • Custom Folders

  28. Other Output • RealName on AltaVista • Direct Hit on HotBot • Subject Directory Categories • News • Books, CDs, etc. “about search term”

  29. Search Engine Showdown • imt.net/~notess/search • Search engine features • See also • www.searchenginewatch.com • See also • Rich Wiggins, Coming up next . . .

  30. Web Search Engines by Greg R. Notess notess@imt.net imt.net/~notess/search

More Related