1 / 36

Federating Repositories of Scientific Literature

Federating Repositories of Scientific Literature. www.canis.uiuc.edu. The Interspace Prototype (1997-2000) Digital Libraries Initiative (1994-1998) Worm Community System (1990-1993) Telesophy System (1984-1989).

fallon
Download Presentation

Federating Repositories of Scientific Literature

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Federating Repositoriesof Scientific Literature www.canis.uiuc.edu The Interspace Prototype (1997-2000) Digital Libraries Initiative (1994-1998) Worm Community System (1990-1993) Telesophy System (1984-1989)

  2. Federating Repositoriesof Scientific LiteratureThe University of Illinois Digital Libraries Initiative (DLI)Project Status & RetrospectiveBruce R. Schatz dli@uiuc.eduhttp://dli.grainger.uiuc.eduAAAS-98, Digital Libraries SessionPhiladelphia, February 1998

  3. Concept Search Document Search Text Search Grand Visions 1960 1970 1980 1990 2000 2010 Syntax Structure Semantics Evolution of Information Retrieval across the Net from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net” cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics

  4. Illinois DLI Status • Production Testbed based in a Real Library • Document Search based on Structure • SGML Publisher Stream deployed at U of Illinois • Technology Research for Scalable Federation • Concept Search based on Semantics • Statistical Indexes across subjects and media

  5. Production Testbed Status • Based in major Engineering Library • Production Stream - in testbed before on shelves • Full-text SGML -- Federated Structure Search • 5 publishers, 55 journals, 40,000 articles • Web version campus rollout October 1997 • integrated within library information services

  6. Production Testbed Evaluation • 700 users, steadily increasing to max 1500 • used in intro Computer Science classes • developers and evaluators work closely • needs assessment and usability studies • careful multi-modal usage evaluation • session observations and transaction logs

  7. Primary Partners • journal/magazine Publishers: • American Institute of Physics (AIP) • American Physical Society (APS) • American Astronomical Society (AAS) • American Society of Civil Engineers (ASCE) • American Society of Mechanical Engineers (ASME) • American Society of Agricultural Engineers (ASAE) • American Institute of Aeronautics & Astronautics (AIAA) • Institute of Electrical and Electronics Engineers (IEEE) • Institution of Electrical Engineers (IEE) • IEEE Computer Society (IEEE-CS) • testbed: SoftQuad, OpenText • infrastructure: Hewlett-Packard, Microsoft

  8. DeLIver Search Interface

  9. DeLIver Search Results

  10. (Full Text Retrieval)

  11. Result of “Figure Caption Search”

  12. Dynamic Linking in Bibliography

  13. Testbed Difficulties • Original plan was to modify Mosaic for search • Web became commercial -- we lost control of developers • Plan to use standard BRS as fulltext backend • needed to use SGML specific OpenText search engine • good-quality SGML simply not available • we had to train every publisher; nothing was ready • SGML interactive display not journal quality • physics requires equations -- hard to display well • Custom software hard to deploy widely • Web widespread but too lowend for professional search

  14. Testbed Successes • Willing to build custom encoding procedures • so succeed with SGML where Elsevier and OCLC failed • Canonical encoding for structure tags • so can federate across publishers and journals • Willing to build custom software for Search • so able to do multiple views not single stream like Web • Production repositories for real Publishers • became R&D arm of major scientific publishers • Changing the nature of libraries with research • research prototype becomes standard service

  15. Technology Transfer • Illinois DLI considered R&D arm of publishers • broad spectrum of major publishers in scientific literature • successful annual partner’s workshop plus high-level visits • Technology transferred to Publisher partners • contract with AIP to clone testbed software & processing • arrangements with ASCE for a second cloning • Testbed Continuance by University Library • industrial partners program between Library & Publishers • company formed to provide software and service

  16. Technology Research • Scalable Semantics becoming feasible • statistical clustering proves useful interactively • concept spaces and category maps • Semantic indexes for large collections • 400K Inspec (1995) • 4M Compendex (1996) • Simulation of Community Repositories • 1000 collections across all of engineering • testbed for vocabulary switching (federation)

  17. Vocabulary Switching • Grand Challenge of Digital Libraries • semantic interoperability across subject domains • vocabulary switching to suggest across domains • Generating 1000 community repositories • 600 categories across engineering (38 top-level) • 150 categories across EE, CS, physics • 3M raw abstracts, about 10M in community spaces • large-scale supercomputer simulation • 7 days of dedicated computation (10 days overall) • have space navigation; need space intersection

  18. Multimedia Federation • Semantic Indexing within Media • Text, Image, Number • Semantic Interoperability across Media • Spatial Data (GIS) dataset intersection • Multi-site DLI Collaboration • U Illinois: systems and supercomputers • U Arizona: algorithms and experiments • UC Santa Barbara: collections and metadata

  19. Semantic Analysis of Multimedia • Collections of Objects containing Units • Text: community repository (topic proximity) document abstracts containing noun phrases • Image: aerial photograph (spatial proximity) feature regions containing texture tiles • Units are media-dependent (statistical parsers) • Text: phrase segmentation (nouns on word parts of speech) • Image: texture segmentation (orientation on pixel densities) • Indexes are media-independent (statistical clusters) • Concept: co-occurrence similarity of units within objects • Category: self-organizing maps of objects within collections

  20. Media Interoperability Experiment • Feature regions containing texture tiles in aerial photos • 1M regions in 5K photos around southern California (GIS) • text concept space and category map in geoscience • 10M phrases in 500K abstracts from Georef and Petroleum Abstracts • image concept space and category map in aerial photos • tile similarity space and visual thesaurus maps (10M tiles) • numeric satellite sensor data • 1M NASA AVHRR temperature records, 2M GNIS feature names • spatial gazetteer as bridge image<=>text<=>number • images are labeled by GNIS gazetteer (feature names for text search)

  21. Federated Search • Multiple Indexes in Distributed Repositories • text search: SGML for full-text articles (Testbed) bibliographic abstracts for full coverage (INSPEC) • term suggestion: thesaurus for taxonomy (INSPEC) concept spaces for term coverage (SGML) • Multiple View User Interface Client • uniform displays for multiple indexes • drag-and-drop between display views to mix-and-match • uniform search across multiple repositories • Multiple Protocol Stateful Gateway • single query stream analog to single user interface • will handle distributed repositories for federation, e.g. AAS • Opentext (socket), term-suggest (SQL), Ovid/DRA (Z39.50)

  22. IODyne Engineering Search Example

  23. Building a new Community starting the field of Digital Libraries • IEEE Computer DLI special issue May 1996 • Computer DLI retrospective planned for 1999 • Allerton workshops on DL Sociology • edited book planned on DL Evaluation • DLI National Coordination effort • Illinois DLI retrospective conference (Mar 98)

  24. The 21st Century: Analysis • Beyond Search to Analysis • Cross-Correlating Information from many sources across the Net • The Net solves problems • Every community has its own special library • Every community and every person does indexing !! • The Internet evolves into the Interspace

More Related