the invisible web finding things that are hard to find
Download
Skip this Video
Download Presentation
The Invisible Web - finding things that are hard to find -

Loading in 2 Seconds...

play fullscreen
1 / 37

The Invisible Web - finding things that are hard to find - - PowerPoint PPT Presentation


  • 389 Views
  • Uploaded on

The Invisible Web - finding things that are hard to find - Tefko Saracevic, PhD Rutgers University http://www.scils.rutgers.edu/~tefko ( contains also a list of sites relevant to the topic and this presentation) What is “Invisible Web?”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Invisible Web - finding things that are hard to find -' - Pat_Xavi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the invisible web finding things that are hard to find

The Invisible Web- finding things that are hard to find -

Tefko Saracevic, PhD

Rutgers University

http://www.scils.rutgers.edu/~tefko

(contains also a list of sites relevant to the topic and this presentation)

© Tefko Saracevic, Rutgers University

what is invisible web
What is “Invisible Web?”
  • Materials that general search engines cannot or WILL not include in their collection of web pages (indexes)
  • You cannot find through general search engines
  • Contains a vast amount of information
    • much of it authoritative, qualitative
    • much of it specialized

© Tefko Saracevic, Rutgers University

why search engines miss
Why search engines miss?
  • Size: Web is huge, cannot cover all
  • Economics: associated costs are high
    • also pay per crawl & rank
  • Technical: still limited capabilities
  • Spam: eliminating bad also looses good
  • Restrictions: some site do not let in
  • Deep structure: some sites complex

© Tefko Saracevic, Rutgers University

web size who knows
Web size - who knows?
  • Web Characterization Project - OCLC
    • provides statistics about the web
    • 1998: 2.8, 2002: 9.04 mill web sites (IP address)
      • In 2002: 35% public, 29% private, 36% provisional sites
    • Public sites (2002):
      • 55% US, 7% German, 6% Japanese, 3% each French, Spanish, 2% each Italian, Dutch, Chinese,1% each Korean, Russian, Polish, Portuguese
    • Adult sites (2002): 3.3%
    • IP address volatility - all sites (disappearance pattern):
      • 13% of sites in 2002were also in 1998; 51% in 2001

© Tefko Saracevic, Rutgers University

how do search engines work
How do search engines work?
  • Crawlers, spiders: go out to find
    • new & changed sites; periodic, not for each query
  • Databases, caches:
    • gather content; could be submitted, bought
  • Indexing:creating appropriate entries
    • various, mostly proprietary algorithms
  • Retrieval engine:searching on basis of query
  • Interface: gathers query, displays results
    • could be ordered by pay

© Tefko Saracevic, Rutgers University

search engines differ
Search engines differ
  • Substantial differences among search engines on each aspect
  • Information about search engines:
    • Search Engine Watch
      • ratings, news, statistics, charts
    • Search Engine Showdown
      • run by a librarian, news links, ratings
    • Extreme Searcher
      • update of a popular book

© Tefko Saracevic, Rutgers University

search engine coverage
Search engine coverage
  • No engine covers more than 16% of WWW
  • Hard to discern & compare coverage
  • Many national search engines - own coverage
  • Many topical search engines – own coverage
  • Many comprehensive sources independent of search engines

© Tefko Saracevic, Rutgers University

specialized sources
Specialized sources
  • Meta search engines
  • Specialized engines & catalogs
  • Domain (subject) engines & catalogs
  • Reference sources
  • Libraries as web sources
  • Virtual libraries
  • Subject databases
  • Societies, organizations

© Tefko Saracevic, Rutgers University

meta search engines
Meta search engines
  • Search engines that cover search engines –
    • CDNET Search.com
      • meta engine of meta engines
    • Dogpile -results from a number of search engines
    • Surfwax -gives statistics and text sources
    • Search Engines Worldwide
      • 174 countries, over 1300 engines
    • Search Engine Guide – categorized by topic

© Tefko Saracevic, Rutgers University

meta engines cont
meta engines … (cont.)
  • Vivisimo
    • clusters results; innovative
  • Complete Planet
    • over 100,000 databases & s engines
  • Invisible Web
    • resources and individual questions
  • Webbrain
    • results in tree structure – fun to use

© Tefko Saracevic, Rutgers University

domain engines catalogs
Domain engines & catalogs
  • Cover general & specific areas
  • Open Directory Project– large edited catalog of the web – global, run by volunteers
  • Nat. Acad. of Sciences of Belarus Interesting WWW sites about science.
  • BUBL LINK-selected Internet resources covering all academic subject areas – UK
  • Profusion – search in categories

© Tefko Saracevic, Rutgers University

domain engines
domain engines …
  • Exist in many domains & subjects – rich!
  • Psychcrawler Amer Psychological Association
    • web index for psychology
  • Entrez PubMed – Nat Library of Medicine
  • CiteSeer - NEC Research Center
    • scientific literature, citations index - free
  • Think Quest – an international organization
    • education resources, programs

© Tefko Saracevic, Rutgers University

domain engines13
domain engines …
  • KIRKE - Katalog der Internetressourcen für die Klassische Philologie aus Erlangen
    • a variety of resources
  • Perseus Digital Library Tufts University
    • covers antiquity to renaissance
  • Sch of Slavonic & East European Studies, University College London
    • includes country resources, e.g. Croatia
  • U Mich Document Center
    • official documents from all over the world

© Tefko Saracevic, Rutgers University

reference services
Reference services
  • Reference services - several models
    • Q&A, directories, email answers etc.
    • Martindale’s Reference Desk
      • comprehensive, amazing; also a health desk
    • Ask Jeeves!
      • most popular, commercial
    • Ask ERIC
      • education questions- email answers
    • Information Please
      • almanac type questions

© Tefko Saracevic, Rutgers University

reference
reference …
  • Digital reference - new service area for libraries
  • QuestionPoint L of Congress & OCLC
    • project for a global reference network
  • Virtual Reference Desk – L of Congress
    • compilation of web reference sites
  • LiveRef - maintained at Iowa State U
    • a registry of real time digital reference services

© Tefko Saracevic, Rutgers University

libraries as web sources
Libraries as web sources
  • Academic libraries providing open collections & services; models vary
    • Rutgers libraries - big long term effort
    • University of California, Berkeley
      • a most elaborate effort together with Sun Corporation
    • Bibliothèque Nationale de France
      • includes virtual exhibitions, among others

© Tefko Saracevic, Rutgers University

virtual libraries on the web
Virtual libraries on the Web
  • Libraries emerging only on the Web
    • Virtual Library –
      • Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’
    • Internet Public Library Michigan
      • also a long term effort
    • Librarians Index of the Internet
      • very popular and comprehensive

© Tefko Saracevic, Rutgers University

virtual libraries
virtual libraries …
  • Academic Info Digital Library
    • many links to digital collections & resources in various subjects
  • Gabriel
    • Gateway to European National Libraries
  • Museum of online museums
    • a delight

© Tefko Saracevic, Rutgers University

subjects databases
Subjects databases
  • Many subject specific sites
    • rich & often unique coverage & services
    • different approaches & requirements
  • Examples in health related domains:
    • WebMDHealth – news, medical information
    • Rxlist - The Internet Drug Index
    • Mayo Clinic HealthOasis – health advice

© Tefko Saracevic, Rutgers University

societies organizations
Societies, organizations
  • Great many rich sources for searching
    • differences in requirements, depth, richness

Examples from variety of organizations:

    • Assoc. for Computing Machinery
      • Digital Library; subscription or registration
    • US State Department
      • about the U.S & other countries
    • Genealogy – Church of Later Day Saints
      • most comprehensive historical list of records

© Tefko Saracevic, Rutgers University

language barriers on the web
Language barriers on the Web
  • English still the major language
    • but declining, now slightly over 50%
  • Multilingual retrieval search engines
    • Euroseek
      • searches in a number of languages
    • All the Web
      • results in 45 languages

© Tefko Saracevic, Rutgers University

language barriers translations
Language barriers: translations
  • A number of translation sites
    • machine aided – i.e. plug in terms, phrases, sentences in one & review in the other language , but effectiveness???
    • Free Translations
      • from to English, & 8 other languages
    • Babel Fish
      • from to English and 9 languages, translates URLs
    • Travlang
      • great for travelers, but annoying commercials

© Tefko Saracevic, Rutgers University

web news keeping up
Web news; keeping up
  • What is going on on the Web? Some major sources of news and evaluations:
  • Free Pint– newsletter, articles, links
  • Internet Resources Newsletter – UK based
  • ResearchBuzz – daily updates; many aspects
  • About.com Web Search – tools, Web Search Forum
  • Resource Shelf – newsletter with archive

© Tefko Saracevic, Rutgers University

keeping up
keeping up …
  • Book

Chris Sherman & Gary Price (2001). Invisible Web: Uncovering information sources search engines can’t see. Information Today

  • Site: Invisible Web
    • provides up to date information

© Tefko Saracevic, Rutgers University

evaluations ratings
Evaluations, ratings
  • Many sources evaluate web sites:
  • The Scout Report –
    • librarians’ BIBLE! Annotations. Comprehensive.
  • Medical Library Assoc. – ten most useful sites;
  • MLA user guide for health inf., recommendations
  • Web 100 – commercial, user ratings, news
  • Evaluating web pages UC Berkeley
    • tutorial and guide

© Tefko Saracevic, Rutgers University

archiving the web
Archiving the web
  • Internet Archive – a large undertaking
    • includes web archive & lots more publicly available & free
    • 10 billion web pages archived from 1996 to a few months ago
    • Wayback Machine – search to look at old versions of web pages
  • But there is more. e.g.:
    • Million Book Project
    • International Children’s Digital Library

© Tefko Saracevic, Rutgers University

needed for web searching
Needed for Web searching
  • Knowledge & competencies on
    • variety of web sources & their organization
    • search engines
    • web search strategies
    • search dynamics, feedback
  • Keeping up & up & up
    • constant updates, changes, innovations
    • many domain/subject specific

© Tefko Saracevic, Rutgers University

needed for web searching by professionals
Needed for Web searching by professionals
  • Knowledge of SOURCES in area of interest
      • search engines not enough
      • not too helpful in finding these other sources; structure hard to discern
  • Evaluation of sources
    • a key professional skill!
      • standard criteria & Web criteria:

authority; accuracy; currency (timeliness); objectivity; coverage,persistence, usability

© Tefko Saracevic, Rutgers University

needed competencies
Needed competencies …
  • Knowledge of users & use
  • Knowledge of searching
  • Use of technology
  • Adaptability, flexibility
  • Integration with other resources
  • Teaching others
  • Constant learning & update
    • keeping up, keeping up, keeping up

© Tefko Saracevic, Rutgers University

slide30
But now really: How to do it?

information

WWW

© Tefko Saracevic, Rutgers University

slide31
Web is still a mystery!

© Tefko Saracevic, Rutgers University

slide32
hvala

thank you

ďakujem vám

danke

merci

grazie

gracias

© Tefko Saracevic, Rutgers University

p s a few weird sites
P.S. a few weird sites…
  • SelectSmart.com
    • all kinds of quizzes for you
  • James Dean official web site
  • Deaducated
    • Dead Librarians’ Society
  • Livejournal
    • blogs & authoring tools

© Tefko Saracevic, Rutgers University

sources
Sources
  • About.com Web Search http://websearch.about.com
  • Academic Info Digital Library http://www.academicinfo.net/digital.html
  • All the Web http://www.alltheweb.com/
  • Ask Eric http://www.askeric.org/Qa/
  • Ask Jeeves! http://www.ask.com/
  • Assoc. for Computing Machinery http://www.acm.org/
  • Babelfish http://babelfish.altavista.com/tr
  • Bibliothèque Nationale de France http://www.bnf.fr/
  • BUBL LINK http://bubl.ac.uk/link/
  • CDNET Search.com http://www.search.com/
  • CiteSeer http://citeseer.nj.nec.com/
  • CompletePlanet http://completeplanet.com
  • Deaducated http://www.geocities.com/deadlibrarians/
  • Dogpile http://www.dogpile.com/
  • Entrez PubMed http://www.ncbi.nlm.nih.gov/PubMed/
  • Extreme Searcher http://www.extremesearcher.com/
  • Free Pint http://www.freepint.com/

© Tefko Saracevic, Rutgers University

sources35
sources …
  • Free Translations http://www.freetranslations.com
  • Gabriel http://www.kb.nl/gabriel/
  • Genealogy http://www.familysearch.org/
  • Information Please http://www.infoplease.com/
  • International Children’s Digital Library http://www.icdlbooks.org/
  • Internet Archive http://www.archive.org/
  • Internet Archive http://www.archive.org/
  • Internet Public Library, Michigan http://www.ipl.org/
  • Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/
  • Invisible Web http://invisibleweb.com
  • James Dean http://www.jamesdean.com/
  • KIRKE http://www.phil.uni-erlangen.de/~p2latein/ressourc/ressourc.html
  • Librarians Index to the Internet http://lii.org/
  • Live Journal http://www.livejournal.com/
  • LiveRef http://www.public.iastate.edu/~CYBERSTACKS/LiveRef.htm
  • Martindale’s Reference Desk http://www-sci.lib.uci.edu/~martindale/Ref.html
  • Mayo Clinic http://www.mayohealth.org/

© Tefko Saracevic, Rutgers University

sources36
sources …
  • Medical Library Assoc. ten top sites http://www.mlanet.org/resources/medspeak/topten.html
  • Medical Library Assoc. user guide for health inf. http://www.mlanet.org/resources/userguide.html
  • Medscape http://www.medscape.com/
  • Million Book Project http://www.archive.org/texts/millionbooks.php
  • Museum of online museums. http://www.coudal.com/moom.php
  • Nat Acad Sciences, Belarus http://www.ac.by/science/index.html
  • OCLC Web Characterization Project http://wcp.oclc.org/
  • Open Directory Project http://dmoz.org
  • Perseus Digital Library http://www.perseus.tufts.edu/
  • Profusion http://www.profusion.com/
  • Psychcrawler http://www.psychcrawler.com/
  • QuestionPoint http://www.questionpoint.org/
  • ResearchBuzz. http://www.researchbuzz.com/index.shtml
  • Resource Shelf http://resourceshelf.blogspot.com/
  • Rutgers Libraries http://www.libraries.rutgers.edu/
  • RxList http://www.rxlist.com/

© Tefko Saracevic, Rutgers University

sources37
sources …
  • Sch of East Eur & Slavonic Studies http://www.ssees.ac.uk/dirctory.htm
  • Search Engine Guide http://www.searchengineguide.com/
  • Search Engine Showdown http://searchengineshowdown.com/
  • Search Engine Watch http://searchenginewatch.com/
  • Search Engines Worldwide http://www.twics.com/~takakuwa/search/search.html
  • Select Smart.com http://www.selectsmart.com/home.html
  • Surfwax http://www.surfwax.com/
  • The Scout Report. http://scout.cs.wisc.edu/
  • Think Quest http://www.thinkquest.org/
  • Travlang http://www.travlang.com
  • U California Berkeley http://sunsite.berkeley.edu/
  • U Mich Documents Center http://www.lib.umich.edu/govdocs/
  • US State department http://www.state.gov/
  • Virtual Library http://vlib.org
  • Virtual Reference Desk http://www.loc.gov/rr/askalib/virtualref.html
  • Vivisimo http://vivisimo.com
  • Web 100 http://www.web100.com
  • Webbrain http://www.webbrain.com/html/default_win.html
  • WebMD http://my.webmd.com/webmd_today/home/default

© Tefko Saracevic, Rutgers University

ad