The invisible web finding things that are hard to find
Download
1 / 37

Invisible Web revisited - PowerPoint PPT Presentation


  • 387 Views
  • Uploaded on

The Invisible Web - finding things that are hard to find - Tefko Saracevic, PhD Rutgers University http://www.scils.rutgers.edu/~tefko ( contains also a list of sites relevant to the topic and this presentation) What is “Invisible Web?”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Invisible Web revisited' - Pat_Xavi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The invisible web finding things that are hard to find l.jpg

The Invisible Web- finding things that are hard to find -

Tefko Saracevic, PhD

Rutgers University

http://www.scils.rutgers.edu/~tefko

(contains also a list of sites relevant to the topic and this presentation)

© Tefko Saracevic, Rutgers University


What is invisible web l.jpg
What is “Invisible Web?”

  • Materials that general search engines cannot or WILL not include in their collection of web pages (indexes)

  • You cannot find through general search engines

  • Contains a vast amount of information

    • much of it authoritative, qualitative

    • much of it specialized

© Tefko Saracevic, Rutgers University


Why search engines miss l.jpg
Why search engines miss?

  • Size: Web is huge, cannot cover all

  • Economics: associated costs are high

    • also pay per crawl & rank

  • Technical: still limited capabilities

  • Spam: eliminating bad also looses good

  • Restrictions: some site do not let in

  • Deep structure: some sites complex

© Tefko Saracevic, Rutgers University


Web size who knows l.jpg
Web size - who knows?

  • Web Characterization Project - OCLC

    • provides statistics about the web

    • 1998: 2.8, 2002: 9.04 mill web sites (IP address)

      • In 2002: 35% public, 29% private, 36% provisional sites

    • Public sites (2002):

      • 55% US, 7% German, 6% Japanese, 3% each French, Spanish, 2% each Italian, Dutch, Chinese,1% each Korean, Russian, Polish, Portuguese

    • Adult sites (2002): 3.3%

    • IP address volatility - all sites (disappearance pattern):

      • 13% of sites in 2002were also in 1998; 51% in 2001

© Tefko Saracevic, Rutgers University


How do search engines work l.jpg
How do search engines work?

  • Crawlers, spiders: go out to find

    • new & changed sites; periodic, not for each query

  • Databases, caches:

    • gather content; could be submitted, bought

  • Indexing:creating appropriate entries

    • various, mostly proprietary algorithms

  • Retrieval engine:searching on basis of query

  • Interface: gathers query, displays results

    • could be ordered by pay

© Tefko Saracevic, Rutgers University


Search engines differ l.jpg
Search engines differ

  • Substantial differences among search engines on each aspect

  • Information about search engines:

    • Search Engine Watch

      • ratings, news, statistics, charts

    • Search Engine Showdown

      • run by a librarian, news links, ratings

    • Extreme Searcher

      • update of a popular book

© Tefko Saracevic, Rutgers University


Search engine coverage l.jpg
Search engine coverage

  • No engine covers more than 16% of WWW

  • Hard to discern & compare coverage

  • Many national search engines - own coverage

  • Many topical search engines – own coverage

  • Many comprehensive sources independent of search engines

© Tefko Saracevic, Rutgers University


Specialized sources l.jpg
Specialized sources

  • Meta search engines

  • Specialized engines & catalogs

  • Domain (subject) engines & catalogs

  • Reference sources

  • Libraries as web sources

  • Virtual libraries

  • Subject databases

  • Societies, organizations

© Tefko Saracevic, Rutgers University


Meta search engines l.jpg
Meta search engines

  • Search engines that cover search engines –

    • CDNET Search.com

      • meta engine of meta engines

    • Dogpile -results from a number of search engines

    • Surfwax -gives statistics and text sources

    • Search Engines Worldwide

      • 174 countries, over 1300 engines

    • Search Engine Guide – categorized by topic

© Tefko Saracevic, Rutgers University


Meta engines cont l.jpg
meta engines … (cont.)

  • Vivisimo

    • clusters results; innovative

  • Complete Planet

    • over 100,000 databases & s engines

  • Invisible Web

    • resources and individual questions

  • Webbrain

    • results in tree structure – fun to use

© Tefko Saracevic, Rutgers University


Domain engines catalogs l.jpg
Domain engines & catalogs

  • Cover general & specific areas

  • Open Directory Project– large edited catalog of the web – global, run by volunteers

  • Nat. Acad. of Sciences of Belarus Interesting WWW sites about science.

  • BUBL LINK-selected Internet resources covering all academic subject areas – UK

  • Profusion – search in categories

© Tefko Saracevic, Rutgers University


Domain engines l.jpg
domain engines …

  • Exist in many domains & subjects – rich!

  • Psychcrawler Amer Psychological Association

    • web index for psychology

  • Entrez PubMed – Nat Library of Medicine

  • CiteSeer - NEC Research Center

    • scientific literature, citations index - free

  • Think Quest – an international organization

    • education resources, programs

© Tefko Saracevic, Rutgers University


Domain engines13 l.jpg
domain engines …

  • KIRKE - Katalog der Internetressourcen für die Klassische Philologie aus Erlangen

    • a variety of resources

  • Perseus Digital Library Tufts University

    • covers antiquity to renaissance

  • Sch of Slavonic & East European Studies, University College London

    • includes country resources, e.g. Croatia

  • U Mich Document Center

    • official documents from all over the world

© Tefko Saracevic, Rutgers University


Reference services l.jpg
Reference services

  • Reference services - several models

    • Q&A, directories, email answers etc.

    • Martindale’s Reference Desk

      • comprehensive, amazing; also a health desk

    • Ask Jeeves!

      • most popular, commercial

    • Ask ERIC

      • education questions- email answers

    • Information Please

      • almanac type questions

© Tefko Saracevic, Rutgers University


Reference l.jpg
reference …

  • Digital reference - new service area for libraries

  • QuestionPoint L of Congress & OCLC

    • project for a global reference network

  • Virtual Reference Desk – L of Congress

    • compilation of web reference sites

  • LiveRef - maintained at Iowa State U

    • a registry of real time digital reference services

© Tefko Saracevic, Rutgers University


Libraries as web sources l.jpg
Libraries as web sources

  • Academic libraries providing open collections & services; models vary

    • Rutgers libraries - big long term effort

    • University of California, Berkeley

      • a most elaborate effort together with Sun Corporation

    • Bibliothèque Nationale de France

      • includes virtual exhibitions, among others

© Tefko Saracevic, Rutgers University


Virtual libraries on the web l.jpg
Virtual libraries on the Web

  • Libraries emerging only on the Web

    • Virtual Library –

      • Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’

    • Internet Public Library Michigan

      • also a long term effort

    • Librarians Index of the Internet

      • very popular and comprehensive

© Tefko Saracevic, Rutgers University


Virtual libraries l.jpg
virtual libraries …

  • Academic Info Digital Library

    • many links to digital collections & resources in various subjects

  • Gabriel

    • Gateway to European National Libraries

  • Museum of online museums

    • a delight

© Tefko Saracevic, Rutgers University


Subjects databases l.jpg
Subjects databases

  • Many subject specific sites

    • rich & often unique coverage & services

    • different approaches & requirements

  • Examples in health related domains:

    • WebMDHealth – news, medical information

    • Rxlist - The Internet Drug Index

    • Mayo Clinic HealthOasis – health advice

© Tefko Saracevic, Rutgers University


Societies organizations l.jpg
Societies, organizations

  • Great many rich sources for searching

    • differences in requirements, depth, richness

      Examples from variety of organizations:

    • Assoc. for Computing Machinery

      • Digital Library; subscription or registration

    • US State Department

      • about the U.S & other countries

    • Genealogy – Church of Later Day Saints

      • most comprehensive historical list of records

© Tefko Saracevic, Rutgers University


Language barriers on the web l.jpg
Language barriers on the Web

  • English still the major language

    • but declining, now slightly over 50%

  • Multilingual retrieval search engines

    • Euroseek

      • searches in a number of languages

    • All the Web

      • results in 45 languages

© Tefko Saracevic, Rutgers University


Language barriers translations l.jpg
Language barriers: translations

  • A number of translation sites

    • machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language , but effectiveness???

    • Free Translations

      • from to English, & 8 other languages

    • Babel Fish

      • from to English and 9 languages, translates URLs

    • Travlang

      • great for travelers, but annoying commercials

© Tefko Saracevic, Rutgers University


Web news keeping up l.jpg
Web news; keeping up

  • What is going on on the Web? Some major sources of news and evaluations:

  • Free Pint– newsletter, articles, links

  • Internet Resources Newsletter – UK based

  • ResearchBuzz – daily updates; many aspects

  • About.com Web Search – tools, Web Search Forum

  • Resource Shelf – newsletter with archive

© Tefko Saracevic, Rutgers University


Keeping up l.jpg
keeping up …

  • Book

    Chris Sherman & Gary Price (2001). Invisible Web: Uncovering information sources search engines can’t see. Information Today

  • Site: Invisible Web

    • provides up to date information

© Tefko Saracevic, Rutgers University


Evaluations ratings l.jpg
Evaluations, ratings

  • Many sources evaluate web sites:

  • The Scout Report –

    • librarians’ BIBLE! Annotations. Comprehensive.

  • Medical Library Assoc. – ten most useful sites;

  • MLA user guide for health inf., recommendations

  • Web 100 – commercial, user ratings, news

  • Evaluating web pages UC Berkeley

    • tutorial and guide

© Tefko Saracevic, Rutgers University


Archiving the web l.jpg
Archiving the web

  • Internet Archive – a large undertaking

    • includes web archive & lots more publicly available & free

    • 10 billion web pages archived from 1996 to a few months ago

    • Wayback Machine – search to look at old versions of web pages

  • But there is more. e.g.:

    • Million Book Project

    • International Children’s Digital Library

© Tefko Saracevic, Rutgers University


Needed for web searching l.jpg
Needed for Web searching

  • Knowledge & competencies on

    • variety of web sources & their organization

    • search engines

    • web search strategies

    • search dynamics, feedback

  • Keeping up & up & up

    • constant updates, changes, innovations

    • many domain/subject specific

© Tefko Saracevic, Rutgers University


Needed for web searching by professionals l.jpg
Needed for Web searching by professionals

  • Knowledge of SOURCES in area of interest

    • search engines not enough

    • not too helpful in finding these other sources; structure hard to discern

  • Evaluation of sources

    • a key professional skill!

      • standard criteria & Web criteria:

        authority; accuracy; currency (timeliness); objectivity; coverage,persistence, usability

  • © Tefko Saracevic, Rutgers University


    Needed competencies l.jpg
    Needed competencies …

    • Knowledge of users & use

    • Knowledge of searching

    • Use of technology

    • Adaptability, flexibility

    • Integration with other resources

    • Teaching others

    • Constant learning & update

      • keeping up, keeping up, keeping up

    © Tefko Saracevic, Rutgers University


    Slide30 l.jpg

    But now really: How to do it?

    information

    WWW

    © Tefko Saracevic, Rutgers University


    Slide31 l.jpg

    Web is still a mystery!

    © Tefko Saracevic, Rutgers University


    Slide32 l.jpg

    hvala

    thank you

    ďakujem vám

    danke

    merci

    grazie

    gracias

    © Tefko Saracevic, Rutgers University


    P s a few weird sites l.jpg
    P.S. a few weird sites…

    • SelectSmart.com

      • all kinds of quizzes for you

    • James Dean official web site

    • Deaducated

      • Dead Librarians’ Society

    • Livejournal

      • blogs & authoring tools

    © Tefko Saracevic, Rutgers University


    Sources l.jpg
    Sources

    • About.com Web Search http://websearch.about.com

    • Academic Info Digital Library http://www.academicinfo.net/digital.html

    • All the Web http://www.alltheweb.com/

    • Ask Eric http://www.askeric.org/Qa/

    • Ask Jeeves! http://www.ask.com/

    • Assoc. for Computing Machinery http://www.acm.org/

    • Babelfish http://babelfish.altavista.com/tr

    • Bibliothèque Nationale de France http://www.bnf.fr/

    • BUBL LINK http://bubl.ac.uk/link/

    • CDNET Search.com http://www.search.com/

    • CiteSeer http://citeseer.nj.nec.com/

    • CompletePlanet http://completeplanet.com

    • Deaducated http://www.geocities.com/deadlibrarians/

    • Dogpile http://www.dogpile.com/

    • Entrez PubMed http://www.ncbi.nlm.nih.gov/PubMed/

    • Extreme Searcher http://www.extremesearcher.com/

    • Free Pint http://www.freepint.com/

    © Tefko Saracevic, Rutgers University


    Sources35 l.jpg
    sources …

    • Free Translations http://www.freetranslations.com

    • Gabriel http://www.kb.nl/gabriel/

    • Genealogy http://www.familysearch.org/

    • Information Please http://www.infoplease.com/

    • International Children’s Digital Library http://www.icdlbooks.org/

    • Internet Archive http://www.archive.org/

    • Internet Archive http://www.archive.org/

    • Internet Public Library, Michigan http://www.ipl.org/

    • Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/

    • Invisible Web http://invisibleweb.com

    • James Dean http://www.jamesdean.com/

    • KIRKE http://www.phil.uni-erlangen.de/~p2latein/ressourc/ressourc.html

    • Librarians Index to the Internet http://lii.org/

    • Live Journal http://www.livejournal.com/

    • LiveRef http://www.public.iastate.edu/~CYBERSTACKS/LiveRef.htm

    • Martindale’s Reference Desk http://www-sci.lib.uci.edu/~martindale/Ref.html

    • Mayo Clinic http://www.mayohealth.org/

    © Tefko Saracevic, Rutgers University


    Sources36 l.jpg
    sources …

    • Medical Library Assoc. ten top sites http://www.mlanet.org/resources/medspeak/topten.html

    • Medical Library Assoc. user guide for health inf. http://www.mlanet.org/resources/userguide.html

    • Medscape http://www.medscape.com/

    • Million Book Project http://www.archive.org/texts/millionbooks.php

    • Museum of online museums. http://www.coudal.com/moom.php

    • Nat Acad Sciences, Belarus http://www.ac.by/science/index.html

    • OCLC Web Characterization Project http://wcp.oclc.org/

    • Open Directory Project http://dmoz.org

    • Perseus Digital Library http://www.perseus.tufts.edu/

    • Profusion http://www.profusion.com/

    • Psychcrawler http://www.psychcrawler.com/

    • QuestionPoint http://www.questionpoint.org/

    • ResearchBuzz. http://www.researchbuzz.com/index.shtml

    • Resource Shelf http://resourceshelf.blogspot.com/

    • Rutgers Libraries http://www.libraries.rutgers.edu/

    • RxList http://www.rxlist.com/

    © Tefko Saracevic, Rutgers University


    Sources37 l.jpg
    sources …

    • Sch of East Eur & Slavonic Studies http://www.ssees.ac.uk/dirctory.htm

    • Search Engine Guide http://www.searchengineguide.com/

    • Search Engine Showdown http://searchengineshowdown.com/

    • Search Engine Watch http://searchenginewatch.com/

    • Search Engines Worldwide http://www.twics.com/~takakuwa/search/search.html

    • Select Smart.com http://www.selectsmart.com/home.html

    • Surfwax http://www.surfwax.com/

    • The Scout Report. http://scout.cs.wisc.edu/

    • Think Quest http://www.thinkquest.org/

    • Travlang http://www.travlang.com

    • U California Berkeley http://sunsite.berkeley.edu/

    • U Mich Documents Center http://www.lib.umich.edu/govdocs/

    • US State department http://www.state.gov/

    • Virtual Library http://vlib.org

    • Virtual Reference Desk http://www.loc.gov/rr/askalib/virtualref.html

    • Vivisimo http://vivisimo.com

    • Web 100 http://www.web100.com

    • Webbrain http://www.webbrain.com/html/default_win.html

    • WebMD http://my.webmd.com/webmd_today/home/default

    © Tefko Saracevic, Rutgers University


    ad