slide1 l.
Skip this Video
Download Presentation
Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libra

Loading in 2 Seconds...

play fullscreen
1 / 70

Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libra - PowerPoint PPT Presentation

  • Uploaded on

Search Engine Roundup!!!. Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libra' - Angelica

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Search Engine Roundup!!!

Michael Hunter

Reference Librarian

Hobart and William Smith Colleges


Rochester Regional Library Council

Member Libraries’ Staff

Sponsored by the

Rochester Regional Library Council

Supported by Library Services and Technology Act (LSTA) and/or Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the

New York State Library


for today
For Today . . .
  • Search Engine Affiliations

What am I actually searching, anyway?

  • The Major Services: an Overview
  • A Look in Detail: AlltheWeb, Teoma,

WiseNut and Gigablast

  • Hands-on Session
  • Search Tips and Techniques
  • A Few Good Metas: Vivisimo and Ixquick
  • New (and newly-redesigned) Services
the internet search industry a volatile world
The Internet Search Industry: A Volatile World
  • Information as commodity
  • Overt actions: Mergers, Acquisitions
  • Covert actions: Database sharing
    • Total
    • Partial
    • Paid Listings only
  • NOTE: Data accurate as of Oct. 6, 2003
the shrinking search industry editorial control of search is shared among few
The Shrinking Search IndustryEditorial control of search is shared among few
  • Yahoo owns
    • AlltheWeb, Altavista, Inktomi, Overture (paid listings)
  • Google
  • MSN
  • AskJeeves owns Teoma
  • LookSmart owns Wisenut
  • Gigablast
  • NOTE: Ownership is different from database affiliation
search engine database affiliates or what am i searching anyway
Search EngineDatabase “Affiliates” or“What am I searching, anyway?”
  • Who crawls the Web?
    • Google
    • Alltheweb
    • Teoma
    • Inktomi
    • AltaVista
    • Wisenut
    • Gigablast
no affiliates for now
No Affiliates (for now!)
  • Altavista
  • Wisenut
  • Gigablast

Overture(NOTE:Purchased AlltheWeb & Altavista in Spring of 2003; Yahoo purchased Overture in Sept. of 2003)

looking over the major players
Looking Over the Major Players
  • Database Size
  • Database Freshness
  • Popularity
database freshness http www searchengineshowdown com stats freshness shtml
Database Freshness
  • Based on a series of 6 current topic searches
  • Pages that are updated daily
  • AND report that date on the page
  • Queries submitted May 17, 2003
database freshness http www searchengineshowdown com stats freshness shtml20
Database Freshness
database freshness http www searchengineshowdown com stats freshness shtml21
Database Freshness
  • Most have some results indexed in the last few days
  • The bulk of most of the databases is about 1 month old
  • Some pages may not have been re-indexed for much longer
Searches per dayself-reported data, as of 2/28/03
four internet search engines what s under the hood alltheweb teoma wisenut gigablast
Four Internet Search Engines:What’s Under the Hood?AlltheWeb, Teoma, Wisenut, Gigablast
  • Developed by FAST of Norway
  • Launched May, 1999
  • Now owned by Overture
  • One of the best!
alltheweb databases
AlltheWeb: Databases
  • Indexed Web pages including PDF, Flash, and other file type
  • News (from 3,000+ international news sources)
  • Images
  • Videos
  • MP3 files
  • FTP files
  • Ads from Overture listed as "Sponsored Results"
alltheweb search features
AlltheWeb: Search Features
  • Boolean capabilities in Basic Search

+(plus) for and

    • for not

( ) for or

e.g. (jazz swing blues) = jazz or swing or blues

  • Boolean capabilities in Advanced Search
    • Via search boxes and drop-down menus
    • Use of rank boosts importance of records containing those term(s)
alltheweb search features27
AlltheWeb: Search Features
  • Results clustered by topic (“Folders”)
  • Both HTML and Multimedia given, when available
  • NOTE: Located at the BOTTOM of each results screen
alltheweb field searching command line and drop down options
AlltheWeb: Field SearchingCommand Line and Drop Down Options
  • In the text
  • In the URL
  • In the link to URL
    • Retrieves pages that link TO the specified URL
  • In the Title
  • In the host name (anywhere)
alltheweb advanced search additional filters and limits
AlltheWeb Advanced Search:Additional Filters and Limits
  • 49 Languages (select up to 8 per search using the Customize Option)
  • IP Address and/or range
  • Domain (TLD, country or region or entire website)
  • Date
  • Document size (UNIQUE!!!)
  • File formats (9)
  • Embedded Content (Media Type)
  • Offensive Content
  • Date Range from Jan. 1, 1980 - present (based on last update, where available)
  • last month
  • last 3 months
  • last 6 months
  • last 9 months
  • last year
document size
Document Size (!!)
  • Limit by bytes, kilobytes or megabytes
additional file formats undocumented in help but they work as of 10 5 03
Additional File FormatsUndocumented in HELP, but they work as of 10/5/03
  • filetype:rtf
  • filetype:powerpoint
  • filetype:excel
  • filetype:postscript
  • filetype:wordperfect
  • filetype:staroffice

(Sun’s Office Suite, running on Linux)

embedded content
Embedded Content
  • Images : All image types (the <img>Tag)
  • Audio : Audio files (midi, wav, au etc.)
  • Video : Video files (Quicktime, AVI, etc.)
  • RealVideo & RealAudio : Streaming RealVideo and RealAudio
  • Macromedia Flash : Macromedia Flash animations
  • Java applets : Java applets (the <applet> tag)
  • JavaScript : JavaScript and ECMAScript
  • VBScript : Microsoft VBScript
  • Launched in 2001
  • Bought by AskJeeves in 2002
  • Database
    • Indexed Web pages (no Images or other Media)
    • Paid listings from Google
    • Results displayed in 3 groupings: Results, Refine and Resources
    • Fourth in database size, after Google, ATW and Inktomi
teoma advanced search features
TeomaAdvanced Search Features
  • Boolean available in Basic and Advanced Search modes
  • Field searches: full text, title or URL
  • Limit by language (8 European)
  • Most limits also operative as commands

site: inurl: intitle: lang:

Certain limits cannot be combined; see Advanced Search HELP

results features 3 results groupings
Results Features3 Results Groupings
  • Results
    • Ranked database results, with “Related Pages”
  • Refine
    • Clustering of your results and other related sites based on term relationships and web community linkages derived from your original results
  • Resources
    • “Link Collections from experts and enthusiasts”

(Subject metasites)

teoma s ranking
Teoma’s Ranking
  • Includes a site’s relationship to other sites with similar content
  • How many links (incoming and outgoing) exist between this site and others on the same subject?
  • To what degree are those other sites inter-linked to the larger web “community” of high quality, similar-subject sites? (Requires some human examination)
  • Plus:
    • Identifies metasites (“Resources”)
    • Offers linkage-based web communities (“Refine”)
  • Minus
    • Smaller database
    • No free URL submission
    • No cached copies
    • No subject directory
  • Launched July 2001
  • Purchased by LookSmart in 2002
  • Single crawler-created database, refreshed often
  • Claims database of 1.5 billion
    • pope canterbury 10/4/03
    • Google:83,200 WiseNut:31,451
  • One partner site, Korea WiseNut
wisenut search features
WiseNut Search Features
  • Full Boolean in Basic and “WiseSearch”
  • Results clustered by content “WiseGuides”
  • “Search This” allows inclusion of WiseGuide folder titles in a search
  • Limit by language (25)
  • Adult content filtering “WiseWatch”
  • “Sneak-a-Peek” opens a result in a new window
  • Launched April, 2002
  • Smaller database than others
    • Over 200 million on 10/4/03
    • pope canterbury Google:83,200 Gigablast:24,919
  • Created and maintained by Matt Wells (alone)
  • Only search engine “continuously updated with index refreshed in real time” (Site submissions are immediately searchable)
  • Ranking depends less on linkage than Google’s ranking, to avoid penalizing newer pages.
  • No advertising (to date)
gigablast search features
Gigablast Search Features
  • Basic search Full Boolean
  • Advanced Search: Full Boolean and 2 (!) phrase boxes
  • Limit by site
  • Limit by domain (URL)
  • Links to a page available
gigablast search features50
Gigablast Search Features
  • Field searches include title, IP address and non-html filetypes:
    • PDF, Word, Excel, PPT, PostScript, Ascii Text
  • Results from one site clustered
  • Cached version available
  • Results include date indexed and lastmodified (!!)
  • Linking to Gigablast improves ranking there
metas and retrieval
Metas and retrieval
  • Metas search quickly but not deeply
  • Search time or a quantity of searches are purchased from sources (typically top 10-50 hits from each)
  • Metas are subject to time-out limits from their sources
  • Each source is usually NOT searched for each query
metas and retrieval56
Metas and retrieval
  • “Dumbing Down the Query”
    • Advanced features are often not available, and then only those that are shared among sources
    • Default setting for time-out is the shortest; set to maximum for more comprehensive searches (when available)
  • For most metas, advertising is the only source of revenue; software sales are rare
metas and retrieval57
Metas and retrieval
  • What is their place in my search strategy?
    • Metas best used for simple searches, with little (or no) syntactic complexity
    • Use them to find the top few sites on a topic
    • For a quick overview of a topic’s coverage on the Web in general
    • Use them “as a last resort” for highly focused topics that elude your usual search tools
    • As a possible indication of coverage of a topic among several engines (NOTE: problematic)
searching the metas
Searching the metas
  • Results depend on
    • Choice of sources
    • Query processing speed OF THE SOURCE
    • Length of time spent at each source
a search comparison
A search comparison . . .
  • Searched heterotropia (abnormal binocular vision) on 4/21/03
  • Vivisimo 77 Shortest 126 Longest
  • Ixquick 37 “from at least 450 results”
  • Profusion 30 Shortest 39 Longest
  • Metacrawler 42 Shortest 61 Longest
  • Webcrawler 31 Shortest 80 Longest
  • Dogpile 29 (no time-out option)
  • Excite 41 Shortest 31 Longest

Stability of ResultsSearched “kids of survival” (modern art group) as a phrase at 3-minute intervals (time-outs at default setting) 4/21/03

  • Sources: MSN, Netscape, Lycos, LookSmart, Gigablast, BBC, Librarian’s Index to the Internet plus 11 specialized news sources and 7 specialized business, medical and governmental sources
  • Offers full Boolean and phrase search (if supported by the source)
  • Offers the following customizations:
    • Selection of sources searched
    • Total number of results retrieved
    • Length of search (“time-out period”)
  • Results combined
  • Source for each result given
  • Ranking data from that source given
  • Duplicates noted, but not repeated
  • Other features:
  • Results are clustered by keyword prevalence or website of origin
  • Offers a preview of each result in a separate window
  • Offers vertical searches: Top News, Business News, Tech News, Sports News
clustering results folders
Clustering results (“folders”)
  • Automated “subject analysis”
  • Facilitates navigation and query refinement
  • Can be hierarchical (folders within folders)
  • One document may appear in several folders
  • Northern Light first public search engine to make use of folders
  • Sources: Altavista, Netscape, Gigablast, Adobe PDF, Avaya PDF, AskJeeves, Teoma, Go, Open Directory, Overture, Kanoodle, LookSmart, WiseNut, FindWhat, Yahoo, MSN
  • Offers full Boolean and phrase search (if supported by the source)
  • Offers the following customizations:
    • Selection of sources searched
    • Length of search (“time-out period”)
  • Results combined
  • Source for each result given
  • Ranking data from that source given
  • Duplicates noted, but not repeated
  • Other features:
  • Offers 7 field searches (when supported by sources)
  • Clusters hits from same site
  • Highlights search terms in each hit
  • Offers “Related Searches”
  • Offers vertical searches: MP3, News, Pictures
a good meta will
A GOOD meta will . . .
  • Re-format queries to be compatible with search syntax of each source
  • Enable searchers to use advanced features (when the sources support them)
  • Indicate overlapping results without repeating them
  • Perform additional processing of results, eg. ranking for appropriateness, catagorization, etc.
  • Use only sources with unique databases
new and newly redesigned search services
New (and newly-redesigned)Search Services
  • Hotbot
    • Search Hotbot (Inktomi) OR Google OR Lycos

OR AskJeeves (NOT a true metaengine)

  • Altavista
    • Still one of the larger search engines; offers daterange search, large image and multimedia collections, and “related pages” feature
  • Fazzle (Metaengine)
  • Turbo10 (Deep Web)
  • (Images)

Thank you and best of luck in exploring web search “beyond Google”

Michael Hunter

Reference Librarian

Warren Hunting Smith Library

Hobart and William Smith Colleges

Geneva, NY 14456

(315) 781-3552