1 / 60

Martin Grötschel

On the Road to Scientific Information Portals: Cooperative Digital Libraries Remarks, Visions, Proposals. Martin Grötschel. IuK 2001, Universität Trier. Contents. Introduction All Information is Part of the Web Can we make this true? The Visible Web and the Deep Web

hue
Download Presentation

Martin Grötschel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Road to Scientific Information Portals:Cooperative Digital LibrariesRemarks, Visions, Proposals Martin Grötschel IuK 2001, Universität Trier

  2. Contents Introduction • All Information is Part of the Web Can we make this true? • The Visible Web and the Deep Web • There could be an Interconnected Network of Science • Integrating All Types of Resources • We should Organize the Cyber Space • To the Benefit of our Society

  3. Contents Introduction • All Information is Part of the Web Can we make this true? • The Visible Web and the Deep Web • There could be an Interconnected Network of Science • Integrating All Types of Resources • We should Organize the Cyber Space • To the Benefit of our Society

  4. Personal Motivation • I have broad interests. • I (have to) search a lot. • I do find things I look for. • However, this process costs too much time and money. • The „scientific information system“ could be much better. • It seems that some scientists have to get involved. • The situation is similar with respect to communication.

  5. Acting Forces • Science drives Technology • Technology drives Change • Change induces Pressure Some Consequences: • Higher Speed and Efficiency • Lower Costs • Universal Connectivity • More and Global Competition What does this imply for Science?

  6. The World of Information • Tons of Printed Material Zillions • of Scientific Web Sites • of E-Journals, E-Prints • of Databases and CD-Roms • of Multimedia Documents • of E-Mail • of Digital Photos and Videos • etc.

  7. The Players • The Author • The Publisher • The Librarian • The Software Developer • The Service Provider • The Scientific Information Center • The Scientific Society • etc. the user

  8. Some Unsolved Issues • Accessability • Searchability • Stability • Compatibility • Pricing • Heterogeneity • Diversity and Complexity of Structures • Quality • Authenticity • etc.

  9. Solution • Scientists have to get involved • Solution must be user driven • Cooperation of players • Consensus about structures Some Suggestions in this Talk

  10. Contents • All Information is Part of the Web Can we make this true?

  11. Current Mathematical Resources • Papers and Preprints • Journals and Books • Reviews and Abstracts • Software and Data Collections • Projects and Persons • Voice, Images, and Video Information • Links, Mail, and Virtual Libraries

  12. Math Papers and Preprints • Preprints of the Math-Net • MPRESS (including ArXiv math,...) • EULER • Digital Library @ ACM

  13. Math Journals and Books • SUB Göttingen („Sondersammelgebiet“) • TIB Hannover (Tech Information Library) • ELib @ Uni Osnabrück • EMIS • Springer LINK • DOCUMENTA MATHEMATICA • Lehmanns.de

  14. Math Reviews and Abstracts • MATH @ Zentralblatt • MathSci @ AMS • MATHDI @ FIZ-Karlsruhe • Jahrbuch der Mathematik

  15. Math Software and Data Collections • Netlib @ ANL • eLib @ ZIB • MuPad @ Uni Paderborn • Algebraic Groups • Cinderella • OpenMath

  16. Projects and Persons • Web Sites of Math Research Institutes • Web Sites of Math Departments • BerNAM • Directory of Mathematicians @ ACM • Comb. Membership List AMS, SIAM, MAA • PERSONA MATHEMATICA @ mat-net.de • SIGMA @ math-net.de

  17. Voice, Images, and Video • Computer Museum • MSRI Video Server • Electronic Geometric Models Application Servers and Software • MATHEMATICA • Cinderella • Inverse Calculator

  18. Links, Mail, and Virtual Libraries • mathematik.de • Math-Net.de • Mathematical Archives • Opt-Net @ ZIB • MathML

  19. There are zillions of Math Resources in the Net.

  20. The Situation is Similar in all other Sciences • How do you know that all this • material exists and where it is? • Old Approach: • Link Lists = WWW Virtual Libraries • But, much more has come up in the recent years!

  21. Is Everything in the Web? • Printed Books • Printed Journals • CD-ROMs • Some Data Bases • Historic Archives • Catalog Cards • ... are not electronically available

  22. Is Everything from the Web in the Web?

  23. Contents • All Information is Part of the Web Can we make this true? • The Visible Web and the Deep Web

  24. The Invisible / Deep Web A fundamental Problem with Search Engines: A Vast Amount of Information is Invisible • Surface Web / Web Robots Start at some „Hubs“ • Interlinked Web Pages • Deep Web • Isolated Web Sites • There are huge Isolated Islands in the Web • Information within Databases, behind CGI Interfaces • Information without Links (e.g. within OPACs of Libraries) • Protected Material, Excluded Explicitly

  25. A Web Search Engine Collecting Visible Information From „The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan-2000“

  26. A Direct Meta Search Engine Fishing for Invisible Information From „The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan. 2000“

  27. Characteristics of the Deep Web- in Comparison to the Visible Web - • Public information is currently 400 to 500 times larger than the commonly defined World Wide Web • 7,500 terabytes of information (550 Billion individual documents), compared to 19 terabytes (1 Billion documents) From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

  28. Characteristics of the Deep Web- in Comparison to the Visible Web - • More than 100,000 Deep Web sites currently exist • 60 of the largest Deep Web Sites collectively contain about 750 terabytes of Information (... narrower, with deeper content) • More than half of the Deep Web content resides in topic specific databases (BrightPlanet concentrates on about 20,000 sites) • A full 95% of the Deep Web is publicly accessible information – not subject to fees or subscriptions • The Deep Web is the largest growing category of new information on the Internet. But theDeep Web is widely unknown. From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

  29. Making the Deep Web Visible Technology: • Meta Search Engines • Bibliographic Meta Search Engines • Virtual Catalogs and Link Lists Organisational Issues: • Building Networks of Digital Libraries • Forming Library and other Cooperatives • Working on Standards and Formats (Common, Open, Metadata,...)

  30. Categories of Information Systems • Web Sites – Collection, Query Interface • Publications – E-Journals, Preprints, ... • Regional/Nat. Collections – Harvesting Systems • Topical Databases – Subject Specific Aggregation • OPACs – Library Holdings • Journal Archives – Archive of Publishers Software/Data Collection – Commercial / Public Archive • Compute Servers – Math. Calculations /Demos • Mailing Lists/Archive – Topical Communication Forum • Topical Portals – Wide Spectrum Information System

  31. Problems: Wide Variety of Servers Problems with Search Engines (Web Robots) • Impose High Load on Servers and Networks • Perverted use of Metadata • Robots can‘t see behind CGI-Interfaces • Access Rights, Range of Licenses Problems with Cascading Search Engines • Diversity of data formats (MAB, MARC Formats, DC, ...) • Multitude of protocols (Z39.50, HTTP, proprietary) Specialized Repositories and Archives • Scientific Journals provided by Commercial Publishers • Document Delivery Systems and Specialized Historic Archives • Maps, Music, Photos, Videos, Multimedia

  32. Contents • All Information is Part of the Web Can we make this true? • The Visible Web and the Deep Web • There could be an Interconnected Network of Science

  33. Virtual Search index Links Metadata OPAC catalog entries Digital Structured digital contents Full texts Data bases Virtual/Digital Library

  34. Towards a Scientific Portalto Interconnect the Digital World Virtual Library Information Portal: Cooperative Virtual Digital Digital Library Scientific Library The Scientific Portal (Information Portal for the Sciences) is an Entry Point to all Types of Information Products from the Sciences. Behind the Scientific Portal is a Structured Network to be coordinated and organized by the Sciences in a cooperative way. A Task for the IuK Initiative?

  35. Lots of Examplesalready exist

  36. An Example in the Making Virtuelle Fachbibliothek Technik der TIB Hannover

  37. Example: The DOE Information Bridge • Started in 1997 with 60.000 searchable full text reports online @ DOE Office of Scientific and Technical Information (OSTI) • Direct Search based on the Distributed Explorer developed by a small Internet Company: Innovative Web Application Ltd. (IWA) • A public version in partnership with the Government Printing Office (GPO) of the USA • Many other Federal Deep Web collections added to the DOE Virtual Library • PubScience • PubMed • NTIS Electronic Catalog (450,000 Titles) • NASA Technical Report Server • Energy Portal Search • Digitization efforts for Gray Literature (@ OSTI)

  38. OSTI Virtual Library

  39. PubScience

  40. The GrayLit Information Network Graphic from „Searching The Deep Web; W.L. Warnick et al.“ D-Lib Magazine, Vol. 7, No. 1, January 2001; www.dlib.org

  41. Preprint Network

  42. DOE OSTI

  43. Energy Portal Search

  44. PubMed

  45. NASA Image Exchange

  46. Federal R & D Architecture Graphic from „Searching The Deep Web; W.L. Warnick et al.“ D-Lib Magazine, Vol. 7, No. 1, January 2001; www.dlib.org

  47. An Observation The Voluntary Work contributed so far was and will stay important. There will, however, be no satisfactory solution without substantial amounts of personal and financial investment. We need to become more professional, e.g., Google versus Math-Net.

  48. Contents • All Information is Part of the Web Can we make this true? • The Visible Web and the Deep Web • There could be an Interconnected Network of Science • Integrating All Types of Resources

  49. Distributed Meta Search Engines Exist What they do: • Query Search Engines, OPACs, Databases • Perform Distributed Searches in Parallel • Cascade Search to reach Large/Vast Amounts of Targets • Deliver Links, Metadata, and/or Full Texts • Handle a Diversity of Data Structures • Use a Multitude of Internet/Web Protocols • Structure Heterogeneous/Large Result Sets They Rely on a Series of Small Configuration Files

  50. Combination of Search Engines • Math-Net: Harvest+DC • KOBV Search Engine • Shared Index • Distributed Search • Shared Index • EULER and Dublin Core • DigiBib NRW As studied by J. Lügger in „Über Suchmaschinen, Verbünde und die Integration von Informationsangeboten“; ABI-Technik, June, 2000

More Related