1 / 70

Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libra

Search Engine Roundup!!!. Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council

Angelica
Download Presentation

Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Engine Roundup!!! Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Library Services and Technology Act (LSTA) and/or Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2003

  2. For Today . . . • Search Engine Affiliations What am I actually searching, anyway? • The Major Services: an Overview • A Look in Detail: AlltheWeb, Teoma, WiseNut and Gigablast • Hands-on Session • Search Tips and Techniques • A Few Good Metas: Vivisimo and Ixquick • New (and newly-redesigned) Services

  3. The Internet Search Industry: A Volatile World • Information as commodity • Overt actions: Mergers, Acquisitions • Covert actions: Database sharing • Total • Partial • Paid Listings only • NOTE: Data accurate as of Oct. 6, 2003

  4. The Shrinking Search IndustryEditorial control of search is shared among few • Yahoo owns • AlltheWeb, Altavista, Inktomi, Overture (paid listings) • Google • MSN • AskJeeves owns Teoma • LookSmart owns Wisenut • Gigablast • NOTE: Ownership is different from database affiliation

  5. Search EngineDatabase “Affiliates” or“What am I searching, anyway?” • Who crawls the Web? • Google • Alltheweb • Teoma • Inktomi • AltaVista • Wisenut • Gigablast

  6. GoogleDatabase Affiliates

  7. AllthewebDatabase Affiliates

  8. TeomaDatabase Affiliates

  9. InktomiDatabase Affiliates

  10. No Affiliates (for now!) • Altavista • Wisenut • Gigablast

  11. Subject Directories:Database Affiliations

  12. Open Directory (www.dmoz.org)Database Affiliates

  13. LookSmartDatabase Affiliates

  14. Paid Listings Suppliers:“Sponsored Links” Often First in Results

  15. Overture(NOTE:Purchased AlltheWeb & Altavista in Spring of 2003; Yahoo purchased Overture in Sept. of 2003)

  16. Google

  17. Looking Over the Major Players • Database Size • Database Freshness • Popularity

  18. Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml • Based on a series of 6 current topic searches • Pages that are updated daily • AND report that date on the page • Queries submitted May 17, 2003

  19. Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml

  20. Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml • Most have some results indexed in the last few days • The bulk of most of the databases is about 1 month old • Some pages may not have been re-indexed for much longer

  21. Searches per dayself-reported data, as of 2/28/03http://searchenginewatch.com/reports/article.php/2156461

  22. Four Internet Search Engines:What’s Under the Hood?AlltheWeb, Teoma, Wisenut, Gigablast

  23. AlltheWeb • Developed by FAST of Norway • Launched May, 1999 • Now owned by Overture • One of the best!

  24. AlltheWeb: Databases • Indexed Web pages including PDF, Flash, and other file type • News (from 3,000+ international news sources) • Images • Videos • MP3 files • FTP files • Ads from Overture listed as "Sponsored Results"

  25. AlltheWeb: Search Features • Boolean capabilities in Basic Search +(plus) for and • for not ( ) for or e.g. (jazz swing blues) = jazz or swing or blues • Boolean capabilities in Advanced Search • Via search boxes and drop-down menus • Use of rank boosts importance of records containing those term(s)

  26. AlltheWeb: Search Features • Results clustered by topic (“Folders”) • Both HTML and Multimedia given, when available • NOTE: Located at the BOTTOM of each results screen

  27. Search for “synchrotron radiation” 10/5/03

  28. AlltheWeb: Field SearchingCommand Line and Drop Down Options • In the text • In the URL • In the link to URL • Retrieves pages that link TO the specified URL • In the Title • In the host name (anywhere)

  29. AlltheWeb Advanced Search:Additional Filters and Limits • 49 Languages (select up to 8 per search using the Customize Option) • IP Address and/or range • Domain (TLD, country or region or entire website) • Date • Document size (UNIQUE!!!) • File formats (9) • Embedded Content (Media Type) • Offensive Content

  30. DomainTLD, country, region and website

  31. Date • Date Range from Jan. 1, 1980 - present (based on last update, where available) • last month • last 3 months • last 6 months • last 9 months • last year

  32. Document Size (!!) • Limit by bytes, kilobytes or megabytes

  33. File Formats

  34. Additional File FormatsUndocumented in HELP, but they work as of 10/5/03 • filetype:rtf • filetype:powerpoint • filetype:excel • filetype:postscript • filetype:wordperfect • filetype:staroffice (Sun’s Office Suite, running on Linux)

  35. Embedded Content • Images : All image types (the <img>Tag) • Audio : Audio files (midi, wav, au etc.) • Video : Video files (Quicktime, AVI, etc.) • RealVideo & RealAudio : Streaming RealVideo and RealAudio • Macromedia Flash : Macromedia Flash animations • Java applets : Java applets (the <applet> tag) • JavaScript : JavaScript and ECMAScript • VBScript : Microsoft VBScript

  36. Website Evaluation FeatureType a URL in the Basic Search Box

  37. Teoma • Launched in 2001 • Bought by AskJeeves in 2002 • Database • Indexed Web pages (no Images or other Media) • Paid listings from Google • Results displayed in 3 groupings: Results, Refine and Resources • Fourth in database size, after Google, ATW and Inktomi

  38. TeomaAdvanced Search Features • Boolean available in Basic and Advanced Search modes • Field searches: full text, title or URL • Limit by language (8 European) • Most limits also operative as commands site: inurl: intitle: lang: Certain limits cannot be combined; see Advanced Search HELP

  39. DomainTLD, country, region and website

  40. Date last modified(Daterange search also available)

  41. Results Features3 Results Groupings • Results • Ranked database results, with “Related Pages” • Refine • Clustering of your results and other related sites based on term relationships and web community linkages derived from your original results • Resources • “Link Collections from experts and enthusiasts” (Subject metasites)

  42. Teoma’s Ranking • Includes a site’s relationship to other sites with similar content • How many links (incoming and outgoing) exist between this site and others on the same subject? • To what degree are those other sites inter-linked to the larger web “community” of high quality, similar-subject sites? (Requires some human examination)

  43. Teoma • Plus: • Identifies metasites (“Resources”) • Offers linkage-based web communities (“Refine”) • Minus • Smaller database • No free URL submission • No cached copies • No subject directory

  44. WiseNut • Launched July 2001 • Purchased by LookSmart in 2002 • Single crawler-created database, refreshed often • Claims database of 1.5 billion • pope canterbury 10/4/03 • Google:83,200 WiseNut:31,451 • One partner site, Korea WiseNut

  45. WiseNut Search Features • Full Boolean in Basic and “WiseSearch” • Results clustered by content “WiseGuides” • “Search This” allows inclusion of WiseGuide folder titles in a search • Limit by language (25) • Adult content filtering “WiseWatch” • “Sneak-a-Peek” opens a result in a new window

  46. Gigablast • Launched April, 2002 • Smaller database than others • Over 200 million on 10/4/03 • pope canterbury Google:83,200 Gigablast:24,919 • Created and maintained by Matt Wells (alone) • Only search engine “continuously updated with index refreshed in real time” (Site submissions are immediately searchable) • Ranking depends less on linkage than Google’s ranking, to avoid penalizing newer pages. • No advertising (to date)

  47. Gigablast Search Features • Basic search Full Boolean • Advanced Search: Full Boolean and 2 (!) phrase boxes • Limit by site • Limit by domain (URL) • Links to a page available

  48. Gigablast Search Features • Field searches include title, IP address and non-html filetypes: • PDF, Word, Excel, PPT, PostScript, Ascii Text • Results from one site clustered • Cached version available • Results include date indexed and lastmodified (!!) • Linking to Gigablast improves ranking there

More Related