Search Engine Roundup!!!
Download
1 / 70

Libraries - PowerPoint PPT Presentation


  • 326 Views
  • Updated On :

Search Engine Roundup!!!. Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Libraries' - Angelica


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Search Engine Roundup!!!

Michael Hunter

Reference Librarian

Hobart and William Smith Colleges

For

Rochester Regional Library Council

Member Libraries’ Staff

Sponsored by the

Rochester Regional Library Council

Supported by Library Services and Technology Act (LSTA) and/or Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the

New York State Library

2003


For today l.jpg
For Today . . .

  • Search Engine Affiliations

    What am I actually searching, anyway?

  • The Major Services: an Overview

  • A Look in Detail: AlltheWeb, Teoma,

    WiseNut and Gigablast

  • Hands-on Session

  • Search Tips and Techniques

  • A Few Good Metas: Vivisimo and Ixquick

  • New (and newly-redesigned) Services


The internet search industry a volatile world l.jpg
The Internet Search Industry: A Volatile World

  • Information as commodity

  • Overt actions: Mergers, Acquisitions

  • Covert actions: Database sharing

    • Total

    • Partial

    • Paid Listings only

  • NOTE: Data accurate as of Oct. 6, 2003


The shrinking search industry editorial control of search is shared among few l.jpg
The Shrinking Search IndustryEditorial control of search is shared among few

  • Yahoo owns

    • AlltheWeb, Altavista, Inktomi, Overture (paid listings)

  • Google

  • MSN

  • AskJeeves owns Teoma

  • LookSmart owns Wisenut

  • Gigablast

  • NOTE: Ownership is different from database affiliation


Search engine database affiliates or what am i searching anyway l.jpg
Search EngineDatabase “Affiliates” or“What am I searching, anyway?”

  • Who crawls the Web?

    • Google

    • Alltheweb

    • Teoma

    • Inktomi

    • AltaVista

    • Wisenut

    • Gigablast


Google database affiliates l.jpg
GoogleDatabase Affiliates


Slide7 l.jpg

AllthewebDatabase Affiliates


Teoma database affiliates l.jpg
TeomaDatabase Affiliates


Inktomi database affiliates l.jpg
InktomiDatabase Affiliates


No affiliates for now l.jpg
No Affiliates (for now!)

  • Altavista

  • Wisenut

  • Gigablast


Subject directories database affiliations l.jpg
Subject Directories:Database Affiliations


Open directory www dmoz org database affiliates l.jpg
Open Directory (www.dmoz.org)Database Affiliates


Slide13 l.jpg

LookSmartDatabase Affiliates


Slide14 l.jpg

Paid Listings Suppliers:“Sponsored Links” Often First in Results


Slide15 l.jpg

Overture(NOTE:Purchased AlltheWeb & Altavista in Spring of 2003; Yahoo purchased Overture in Sept. of 2003)



Looking over the major players l.jpg
Looking Over the Major Players

  • Database Size

  • Database Freshness

  • Popularity


Database freshness http www searchengineshowdown com stats freshness shtml l.jpg
Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml

  • Based on a series of 6 current topic searches

  • Pages that are updated daily

  • AND report that date on the page

  • Queries submitted May 17, 2003


Database freshness http www searchengineshowdown com stats freshness shtml20 l.jpg
Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml


Database freshness http www searchengineshowdown com stats freshness shtml21 l.jpg
Database Freshnesshttp://www.searchengineshowdown.com/stats/freshness.shtml

  • Most have some results indexed in the last few days

  • The bulk of most of the databases is about 1 month old

  • Some pages may not have been re-indexed for much longer


Slide22 l.jpg
Searches per dayself-reported data, as of 2/28/03http://searchenginewatch.com/reports/article.php/2156461


Four internet search engines what s under the hood alltheweb teoma wisenut gigablast l.jpg
Four Internet Search Engines:What’s Under the Hood?AlltheWeb, Teoma, Wisenut, Gigablast


Alltheweb l.jpg
AlltheWeb

  • Developed by FAST of Norway

  • Launched May, 1999

  • Now owned by Overture

  • One of the best!


Alltheweb databases l.jpg
AlltheWeb: Databases

  • Indexed Web pages including PDF, Flash, and other file type

  • News (from 3,000+ international news sources)

  • Images

  • Videos

  • MP3 files

  • FTP files

  • Ads from Overture listed as "Sponsored Results"


Alltheweb search features l.jpg
AlltheWeb: Search Features

  • Boolean capabilities in Basic Search

    +(plus) for and

    • for not

      ( ) for or

      e.g. (jazz swing blues) = jazz or swing or blues

  • Boolean capabilities in Advanced Search

    • Via search boxes and drop-down menus

    • Use of rank boosts importance of records containing those term(s)


Alltheweb search features27 l.jpg
AlltheWeb: Search Features

  • Results clustered by topic (“Folders”)

  • Both HTML and Multimedia given, when available

  • NOTE: Located at the BOTTOM of each results screen



Alltheweb field searching command line and drop down options l.jpg
AlltheWeb: Field SearchingCommand Line and Drop Down Options

  • In the text

  • In the URL

  • In the link to URL

    • Retrieves pages that link TO the specified URL

  • In the Title

  • In the host name (anywhere)


Alltheweb advanced search additional filters and limits l.jpg
AlltheWeb Advanced Search:Additional Filters and Limits

  • 49 Languages (select up to 8 per search using the Customize Option)

  • IP Address and/or range

  • Domain (TLD, country or region or entire website)

  • Date

  • Document size (UNIQUE!!!)

  • File formats (9)

  • Embedded Content (Media Type)

  • Offensive Content


Domain tld country region and website l.jpg
DomainTLD, country, region and website


Slide32 l.jpg
Date

  • Date Range from Jan. 1, 1980 - present (based on last update, where available)

  • last month

  • last 3 months

  • last 6 months

  • last 9 months

  • last year


Document size l.jpg
Document Size (!!)

  • Limit by bytes, kilobytes or megabytes



Additional file formats undocumented in help but they work as of 10 5 03 l.jpg
Additional File FormatsUndocumented in HELP, but they work as of 10/5/03

  • filetype:rtf

  • filetype:powerpoint

  • filetype:excel

  • filetype:postscript

  • filetype:wordperfect

  • filetype:staroffice

    (Sun’s Office Suite, running on Linux)


Embedded content l.jpg
Embedded Content

  • Images : All image types (the <img>Tag)

  • Audio : Audio files (midi, wav, au etc.)

  • Video : Video files (Quicktime, AVI, etc.)

  • RealVideo & RealAudio : Streaming RealVideo and RealAudio

  • Macromedia Flash : Macromedia Flash animations

  • Java applets : Java applets (the <applet> tag)

  • JavaScript : JavaScript and ECMAScript

  • VBScript : Microsoft VBScript


Website evaluation feature type a url in the basic search box l.jpg
Website Evaluation FeatureType a URL in the Basic Search Box


Teoma l.jpg
Teoma

  • Launched in 2001

  • Bought by AskJeeves in 2002

  • Database

    • Indexed Web pages (no Images or other Media)

    • Paid listings from Google

    • Results displayed in 3 groupings: Results, Refine and Resources

    • Fourth in database size, after Google, ATW and Inktomi


Teoma advanced search features l.jpg
TeomaAdvanced Search Features

  • Boolean available in Basic and Advanced Search modes

  • Field searches: full text, title or URL

  • Limit by language (8 European)

  • Most limits also operative as commands

    site: inurl: intitle: lang:

    Certain limits cannot be combined; see Advanced Search HELP


Domain tld country region and website40 l.jpg
DomainTLD, country, region and website


Date last modified daterange search also available l.jpg
Date last modified(Daterange search also available)


Results features 3 results groupings l.jpg
Results Features3 Results Groupings

  • Results

    • Ranked database results, with “Related Pages”

  • Refine

    • Clustering of your results and other related sites based on term relationships and web community linkages derived from your original results

  • Resources

    • “Link Collections from experts and enthusiasts”

      (Subject metasites)


Teoma s ranking l.jpg
Teoma’s Ranking

  • Includes a site’s relationship to other sites with similar content

  • How many links (incoming and outgoing) exist between this site and others on the same subject?

  • To what degree are those other sites inter-linked to the larger web “community” of high quality, similar-subject sites? (Requires some human examination)


Teoma45 l.jpg
Teoma

  • Plus:

    • Identifies metasites (“Resources”)

    • Offers linkage-based web communities (“Refine”)

  • Minus

    • Smaller database

    • No free URL submission

    • No cached copies

    • No subject directory


Wisenut l.jpg
WiseNut

  • Launched July 2001

  • Purchased by LookSmart in 2002

  • Single crawler-created database, refreshed often

  • Claims database of 1.5 billion

    • pope canterbury 10/4/03

    • Google:83,200 WiseNut:31,451

  • One partner site, Korea WiseNut


Wisenut search features l.jpg
WiseNut Search Features

  • Full Boolean in Basic and “WiseSearch”

  • Results clustered by content “WiseGuides”

  • “Search This” allows inclusion of WiseGuide folder titles in a search

  • Limit by language (25)

  • Adult content filtering “WiseWatch”

  • “Sneak-a-Peek” opens a result in a new window


Gigablast l.jpg
Gigablast

  • Launched April, 2002

  • Smaller database than others

    • Over 200 million on 10/4/03

    • pope canterbury Google:83,200 Gigablast:24,919

  • Created and maintained by Matt Wells (alone)

  • Only search engine “continuously updated with index refreshed in real time” (Site submissions are immediately searchable)

  • Ranking depends less on linkage than Google’s ranking, to avoid penalizing newer pages.

  • No advertising (to date)


Gigablast search features l.jpg
Gigablast Search Features

  • Basic search Full Boolean

  • Advanced Search: Full Boolean and 2 (!) phrase boxes

  • Limit by site

  • Limit by domain (URL)

  • Links to a page available


Gigablast search features50 l.jpg
Gigablast Search Features

  • Field searches include title, IP address and non-html filetypes:

    • PDF, Word, Excel, PPT, PostScript, Ascii Text

  • Results from one site clustered

  • Cached version available

  • Results include date indexed and lastmodified (!!)

  • Linking to Gigablast improves ranking there


Metaengines vivisimo and ixquick two of the best available l.jpg

Metaengines: Vivisimo and IxquickTwo of the best available!


Metas and retrieval l.jpg
Metas and retrieval

  • Metas search quickly but not deeply

  • Search time or a quantity of searches are purchased from sources (typically top 10-50 hits from each)

  • Metas are subject to time-out limits from their sources

  • Each source is usually NOT searched for each query


Metas and retrieval56 l.jpg
Metas and retrieval

  • “Dumbing Down the Query”

    • Advanced features are often not available, and then only those that are shared among sources

    • Default setting for time-out is the shortest; set to maximum for more comprehensive searches (when available)

  • For most metas, advertising is the only source of revenue; software sales are rare


Metas and retrieval57 l.jpg
Metas and retrieval

  • What is their place in my search strategy?

    • Metas best used for simple searches, with little (or no) syntactic complexity

    • Use them to find the top few sites on a topic

    • For a quick overview of a topic’s coverage on the Web in general

    • Use them “as a last resort” for highly focused topics that elude your usual search tools

    • As a possible indication of coverage of a topic among several engines (NOTE: problematic)


Searching the metas l.jpg
Searching the metas

  • Results depend on

    • Choice of sources

    • Query processing speed OF THE SOURCE

    • Length of time spent at each source


A search comparison l.jpg
A search comparison . . .

  • Searched heterotropia (abnormal binocular vision) on 4/21/03

  • Vivisimo 77 Shortest 126 Longest

  • Ixquick 37 “from at least 450 results”

  • Profusion 30 Shortest 39 Longest

  • Metacrawler 42 Shortest 61 Longest

  • Webcrawler 31 Shortest 80 Longest

  • Dogpile 29 (no time-out option)

  • Excite 41 Shortest 31 Longest


Slide60 l.jpg

Stability of ResultsSearched “kids of survival” (modern art group) as a phrase at 3-minute intervals (time-outs at default setting) 4/21/03


Vivisimo l.jpg
Vivisimo

  • http://vivisimo.com

  • Sources: MSN, Netscape, Lycos, LookSmart, Gigablast, BBC, Librarian’s Index to the Internet plus 11 specialized news sources and 7 specialized business, medical and governmental sources

  • Offers full Boolean and phrase search (if supported by the source)


Vivisimo62 l.jpg
Vivisimo

  • Offers the following customizations:

    • Selection of sources searched

    • Total number of results retrieved

    • Length of search (“time-out period”)

  • Results combined

  • Source for each result given

  • Ranking data from that source given

  • Duplicates noted, but not repeated


Vivisimo63 l.jpg
Vivisimo

  • Other features:

  • Results are clustered by keyword prevalence or website of origin

  • Offers a preview of each result in a separate window

  • Offers vertical searches: Top News, Business News, Tech News, Sports News


Clustering results folders l.jpg
Clustering results (“folders”)

  • Automated “subject analysis”

  • Facilitates navigation and query refinement

  • Can be hierarchical (folders within folders)

  • One document may appear in several folders

  • Northern Light first public search engine to make use of folders


Ixquick l.jpg
Ixquick

  • http://ixquick.com

  • Sources: Altavista, Netscape, Gigablast, Adobe PDF, Avaya PDF, AskJeeves, Teoma, Go, Open Directory, Overture, Kanoodle, LookSmart, WiseNut, FindWhat, Yahoo, MSN

  • Offers full Boolean and phrase search (if supported by the source)

  • Offers the following customizations:

    • Selection of sources searched

    • Length of search (“time-out period”)


Ixquick66 l.jpg
Ixquick

  • Results combined

  • Source for each result given

  • Ranking data from that source given

  • Duplicates noted, but not repeated


Ixquick67 l.jpg
Ixquick

  • Other features:

  • Offers 7 field searches (when supported by sources)

  • Clusters hits from same site

  • Highlights search terms in each hit

  • Offers “Related Searches”

  • Offers vertical searches: MP3, News, Pictures


A good meta will l.jpg
A GOOD meta will . . .

  • Re-format queries to be compatible with search syntax of each source

  • Enable searchers to use advanced features (when the sources support them)

  • Indicate overlapping results without repeating them

  • Perform additional processing of results, eg. ranking for appropriateness, catagorization, etc.

  • Use only sources with unique databases


New and newly redesigned search services l.jpg
New (and newly-redesigned)Search Services

  • Hotbot

    • Search Hotbot (Inktomi) OR Google OR Lycos

      OR AskJeeves (NOT a true metaengine)

  • Altavista

    • Still one of the larger search engines; offers daterange search, large image and multimedia collections, and “related pages” feature

  • Fazzle (Metaengine)

  • Turbo10 (Deep Web)

  • Picsearch.com (Images)


Slide70 l.jpg

Thank you and best of luck in exploring web search “beyond Google”

Michael Hunter

Reference Librarian

Warren Hunting Smith Library

Hobart and William Smith Colleges

Geneva, NY 14456

(315) 781-3552 [email protected]


ad