slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Martin Grötschel PowerPoint Presentation
Download Presentation
Martin Grötschel

Loading in 2 Seconds...

play fullscreen
1 / 60

Martin Grötschel - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

On the Road to Scientific Information Portals: Cooperative Digital Libraries Remarks, Visions, Proposals. Martin Grötschel. IuK 2001, Universität Trier. Contents. Introduction All Information is Part of the Web Can we make this true? The Visible Web and the Deep Web

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Martin Grötschel' - hue


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

On the Road to Scientific Information Portals:Cooperative Digital LibrariesRemarks, Visions, Proposals

Martin Grötschel

IuK 2001, Universität Trier

contents
Contents

Introduction

  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
  • There could be an Interconnected Network of Science
  • Integrating All Types of Resources
  • We should Organize the Cyber Space
  • To the Benefit of our Society
contents1
Contents

Introduction

  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
  • There could be an Interconnected Network of Science
  • Integrating All Types of Resources
  • We should Organize the Cyber Space
  • To the Benefit of our Society
personal motivation
Personal Motivation
  • I have broad interests.
  • I (have to) search a lot.
  • I do find things I look for.
  • However, this process costs too much time and money.
  • The „scientific information system“

could be much better.

  • It seems that some scientists have to get involved.
  • The situation is similar with respect to communication.
acting forces
Acting Forces
  • Science drives Technology
  • Technology drives Change
  • Change induces Pressure

Some Consequences:

  • Higher Speed and Efficiency
  • Lower Costs
  • Universal Connectivity
  • More and Global Competition

What does this imply for Science?

the world of information
The World of Information
  • Tons of Printed Material

Zillions

  • of Scientific Web Sites
  • of E-Journals, E-Prints
  • of Databases and CD-Roms
  • of Multimedia Documents
  • of E-Mail
  • of Digital Photos and Videos
  • etc.
the players
The Players
  • The Author
  • The Publisher
  • The Librarian
  • The Software Developer
  • The Service Provider
  • The Scientific Information Center
  • The Scientific Society
  • etc.

the user

some unsolved issues
Some Unsolved Issues
  • Accessability
  • Searchability
  • Stability
  • Compatibility
  • Pricing
  • Heterogeneity
  • Diversity and Complexity of Structures
  • Quality
  • Authenticity
  • etc.
solution
Solution
  • Scientists have to get involved
  • Solution must be user driven
  • Cooperation of players
  • Consensus about structures

Some Suggestions in this Talk

contents2
Contents
  • All Information is Part of the Web

Can we make this true?

current mathematical resources
Current Mathematical Resources
  • Papers and Preprints
  • Journals and Books
  • Reviews and Abstracts
  • Software and Data Collections
  • Projects and Persons
  • Voice, Images, and Video Information
  • Links, Mail, and Virtual Libraries
math papers and preprints
Math Papers and Preprints
  • Preprints of the Math-Net
  • MPRESS (including ArXiv math,...)
  • EULER
  • Digital Library @ ACM
math journals and books
Math Journals and Books
  • SUB Göttingen („Sondersammelgebiet“)
  • TIB Hannover (Tech Information Library)
  • ELib @ Uni Osnabrück
  • EMIS
  • Springer LINK
  • DOCUMENTA MATHEMATICA
  • Lehmanns.de
math reviews and abstracts
Math Reviews and Abstracts
  • MATH @ Zentralblatt
  • MathSci @ AMS
  • MATHDI @ FIZ-Karlsruhe
  • Jahrbuch der Mathematik
math software and data collections
Math Software and Data Collections
  • Netlib @ ANL
  • eLib @ ZIB
  • MuPad @ Uni Paderborn
  • Algebraic Groups
  • Cinderella
  • OpenMath
projects and persons
Projects and Persons
  • Web Sites of Math Research Institutes
  • Web Sites of Math Departments
  • BerNAM
  • Directory of Mathematicians @ ACM
  • Comb. Membership List AMS, SIAM, MAA
  • PERSONA MATHEMATICA @ mat-net.de
  • SIGMA @ math-net.de
voice images and video
Voice, Images, and Video
  • Computer Museum
  • MSRI Video Server
  • Electronic Geometric Models

Application Servers and Software

  • MATHEMATICA
  • Cinderella
  • Inverse Calculator
links mail and virtual libraries
Links, Mail, and Virtual Libraries
  • mathematik.de
  • Math-Net.de
  • Mathematical Archives
  • Opt-Net @ ZIB
  • MathML
slide19

There are zillions of

Math Resources in the Net.

slide20

The Situation is Similar in all other Sciences

    • How do you know that all this
    • material exists and where it is?
  • Old Approach:
    • Link Lists = WWW Virtual Libraries
  • But, much more has come up in the recent years!
is everything in the web
Is Everything in the Web?
  • Printed Books
  • Printed Journals
  • CD-ROMs
  • Some Data Bases
  • Historic Archives
  • Catalog Cards
  • ...

are not electronically available

contents3
Contents
  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
the invisible deep web
The Invisible / Deep Web

A fundamental Problem with Search Engines:

A Vast Amount of Information is Invisible

  • Surface Web / Web Robots Start at some „Hubs“
    • Interlinked Web Pages
  • Deep Web
    • Isolated Web Sites
    • There are huge Isolated Islands in the Web
    • Information within Databases, behind CGI Interfaces
    • Information without Links (e.g. within OPACs of Libraries)
    • Protected Material, Excluded Explicitly
a web search engine collecting visible information
A Web Search Engine Collecting Visible Information

From „The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan-2000“

a direct meta search engine fishing for invisible information
A Direct Meta Search Engine Fishing for Invisible Information

From „The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan. 2000“

characteristics of the deep web in comparison to the visible web
Characteristics of the Deep Web- in Comparison to the Visible Web -
  • Public information is currently 400 to 500 times larger than the commonly defined World Wide Web
  • 7,500 terabytes of information (550 Billion individual documents), compared to 19 terabytes (1 Billion documents)

From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

characteristics of the deep web in comparison to the visible web1
Characteristics of the Deep Web- in Comparison to the Visible Web -
  • More than 100,000 Deep Web sites currently exist
    • 60 of the largest Deep Web Sites collectively contain about 750 terabytes of Information (... narrower, with deeper content)
    • More than half of the Deep Web content resides in topic specific databases (BrightPlanet concentrates on about 20,000 sites)
  • A full 95% of the Deep Web is publicly accessible information – not subject to fees or subscriptions
  • The Deep Web is the largest growing category of new information on the Internet. But theDeep Web is widely unknown.

From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

making the deep web visible
Making the Deep Web Visible

Technology:

  • Meta Search Engines
  • Bibliographic Meta Search Engines
  • Virtual Catalogs and Link Lists

Organisational Issues:

  • Building Networks of Digital Libraries
  • Forming Library and other Cooperatives
  • Working on Standards and Formats (Common, Open, Metadata,...)
categories of information systems
Categories of Information Systems
  • Web Sites – Collection, Query Interface
  • Publications – E-Journals, Preprints, ...
  • Regional/Nat. Collections – Harvesting Systems
  • Topical Databases – Subject Specific Aggregation
  • OPACs – Library Holdings
  • Journal Archives – Archive of Publishers Software/Data Collection – Commercial / Public Archive
  • Compute Servers – Math. Calculations /Demos
  • Mailing Lists/Archive – Topical Communication Forum
  • Topical Portals – Wide Spectrum Information System
problems wide variety of servers
Problems: Wide Variety of Servers

Problems with Search Engines (Web Robots)

  • Impose High Load on Servers and Networks
  • Perverted use of Metadata
  • Robots can‘t see behind CGI-Interfaces
  • Access Rights, Range of Licenses

Problems with Cascading Search Engines

  • Diversity of data formats (MAB, MARC Formats, DC, ...)
  • Multitude of protocols (Z39.50, HTTP, proprietary)

Specialized Repositories and Archives

  • Scientific Journals provided by Commercial Publishers
  • Document Delivery Systems and Specialized Historic Archives
  • Maps, Music, Photos, Videos, Multimedia
contents4
Contents
  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
  • There could be an Interconnected Network of Science
virtual digital library
Virtual

Search index

Links

Metadata

OPAC catalog entries

Digital

Structured digital contents

Full texts

Data bases

Virtual/Digital Library
towards a scientific portal to interconnect the digital world
Towards a Scientific Portalto Interconnect the Digital World

Virtual Library

Information Portal:

Cooperative Virtual Digital

Digital Library Scientific Library

The Scientific Portal (Information Portal for the Sciences)

is an Entry Point

to all Types of Information Products from the Sciences.

Behind the Scientific Portal is a Structured Network

to be coordinated and organized by the

Sciences in a cooperative way.

A Task for the IuK Initiative?

an example in the making
An Example in the Making

Virtuelle Fachbibliothek Technik

der TIB Hannover

example the doe information bridge
Example: The DOE Information Bridge
  • Started in 1997 with 60.000 searchable full text reports online @ DOE Office of Scientific and Technical Information (OSTI)
  • Direct Search based on the Distributed Explorer developed by a small Internet Company: Innovative Web Application Ltd. (IWA)
  • A public version in partnership with the Government Printing Office (GPO) of the USA
  • Many other Federal Deep Web collections added to the DOE Virtual Library
    • PubScience
    • PubMed
    • NTIS Electronic Catalog (450,000 Titles)
    • NASA Technical Report Server
  • Energy Portal Search
  • Digitization efforts for Gray Literature (@ OSTI)
the graylit information network
The GrayLit Information Network

Graphic from „Searching The Deep Web; W.L. Warnick et al.“

D-Lib Magazine, Vol. 7, No. 1, January 2001; www.dlib.org

federal r d architecture
Federal R & D Architecture

Graphic from „Searching The Deep Web; W.L. Warnick et al.“

D-Lib Magazine, Vol. 7, No. 1, January 2001; www.dlib.org

an observation
An Observation

The Voluntary Work contributed so far was and will stay important.

There will, however, be no satisfactory solution without substantial amounts of personal and financial investment.

We need to become more professional,

e.g., Google versus Math-Net.

contents5
Contents
  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
  • There could be an Interconnected Network of Science
  • Integrating All Types of Resources
distributed meta search engines exist
Distributed Meta Search Engines Exist

What they do:

  • Query Search Engines, OPACs, Databases
  • Perform Distributed Searches in Parallel
  • Cascade Search to reach Large/Vast Amounts

of Targets

  • Deliver Links, Metadata, and/or Full Texts
  • Handle a Diversity of Data Structures
  • Use a Multitude of Internet/Web Protocols
  • Structure Heterogeneous/Large Result Sets

They Rely on a Series of Small Configuration Files

combination of search engines
Combination of Search Engines
  • Math-Net: Harvest+DC
  • KOBV Search Engine
    • Shared Index
    • Distributed Search
    • Shared Index
  • EULER and Dublin Core
  • DigiBib NRW

As studied by J. Lügger in „Über Suchmaschinen, Verbünde und die Integration von Informationsangeboten“; ABI-Technik, June, 2000

a potential math information portal
A Potential Math Information Portal

Vertical Integration of Information Resources

DigiBib

with

Math-

Net

Open

Distributed

Efficient

Scalable

Stable

Browser

  • Math-Net @ ZIB and @ Uni Köln
    • Sigma
    • NetLib Software
    • Persona Mathematica
  • EMS @ Zentralblatt für Mathematik
    • MATH, MATHDI
    • Jahrbuch für Mathematik
  • Universität Osnabrück
    • ELib
    • MPRESS
  • Special Interest Groups of DMV
    • OPT-NET, IM-Net, IuK, ...
  • Publishers and Software Houses
    • E-Journals, Software

HTTP

DS

WWW

HTTP

Z39.50

with MAB2

USMARC

  • SUB Göttingen
    • OPAC SSG Mathematik
  • TIB Hannover
    • TIB CAT
  • CWI Amsterdam
    • OPAC Mathematics
  • Mathematische Fachbereiche & Institute
    • Specialized OPACs
  • Library Cooperatives
    • BVB, GBV, HBZ, KOBV, ...
  • Die Deutsche Bibliothek
    • Authority Data
  • Publishers and Math Societies
    • Math-Journals and -Document

Z39.50 with

UNIMARC

HTTP

SI

DS

Z39.50

Z39.50

DigiBib with KOBV

DigiBib with WebPack

contents6
Contents
  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
  • There could be an Interconnected Network of Science
  • Integrating All Types of Resources
  • We should Organize the Cyber Space Scientists should Organize the Scientific Cyberspace Cooperatively (Summary and Proposals)
organizing the cyberspace suggestions
Organizing the Cyberspace: Suggestions
  • Partners for the information portal?
  • Who should form the information portals?
  • Organizational framework?

Cooperative Digital Libraries

Main Issues: Sustainability and Finance

partners of the information portal
Partners of the Information Portal
  • Scientific Libraries, Scientific Archives
  • Scientific Departments, Research Institutes
  • Database / Content Providers
  • Document Delivery Services
  • Digitization Centers
  • Scientific Societies
  • Publishers
  • Software Houses
  • Data (Collecting) Centers
suggestions for an information portal
Suggestions for an Information Portal
  • Open Digital Archives of Specialized Collections
  • Scientific Suppliers Obtain Free Access
  • High Quality Information and Services
  • Robust/Commercial Software/Database
  • Distributed/Heterogeneous Architecture
  • Some Centralization is Necessary Too
  • Emphasis on Reliable/Long Term Availability
  • Activities in Long Term Archival
  • Supported by a Specialized Information Center/Library
  • Cooperation with Scientific Societies

Not-for-Profit and For-Profit do not exclude each other.

suggestions for an organizational framework
Suggestions for an Organizational Framework
  • University Level (local)
    • University Library
    • University Computing Center Cooperation
    • University Media Center
  • Scientific Level (topical/national)
    • Specialized Library / Information Center
    • Consulted by a Scientific Society Editorial
    • Topical Competence Center
  • National Level
    • National Competence Center for New Technologies
    • Research and Development for Production Consultation
    • Standardization / Coordination Activities

A Topical Competence Center may be hosted @ Research Institute.

key problems
Key Problems
  • No progress without substantial investment
  • Long term sustainability
  • No progress without further research and development
  • Institutionalization (The IuK-Initiative can

literally initiate , but can‘t run the show)

But the show must go on!

contents7
Contents
  • All Information is Part of the Web

Can we make this true?

  • The Visible Web and the Deep Web
  • There could be an Interconnected Network of Science
  • Integrating All Types of Resources
  • We should Organize the Cyber Space
  • To the Benefit of our Society
who will benefit
Who Will Benefit
  • Student: Access to Vast Amount of Materials
  • Employee: Further Training, Lifelong Learning
  • Teacher: Reuse of High Quality Materials
  • Author: Publishing Cheap, Fast, and Widely
  • Publisher: Open Sources Generate New Chances
  • Business: More Profit from Applying Science
  • Citizen: Contacting Research More Directly
  • Science: Communicating with the Public
  • Society: Free Flow of Information