1 / 50

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search. William H. Mischo w-mischo@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002)

selah
Download Presentation

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo w-mischo@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002) December 19, 2002

  2. Outline • Digital Libraries and the Distributed Information Environment. • Document Representation and Full-Text • Digital Library Tools • Illinois Projects. • XML Technologies. • Metadata Technologies. • DOIs, Linking, Local Resolver • Portals, Simultaneous Search, Linking • Grainger Search Aid • Issues & Trends.

  3. The Digital Library • ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time. • Tendency to apply term to collections and resources. • Digital Collections vs. Digital Library. • Emphasis on the integration of collections and services (e.g. NSDL grant). • Application of standards and protocols is important.

  4. Scholarly Communication Overview • E-Resources are Web-based and publisher-centric. • Growth of Heterogeneous Distributed Repositories. • Value-added services and ‘branding’ of journals. • Prestige of Journals and Publishers • Reciprocal linking relationships between publishers. • Cooperation on linking standards (DOI, CrossRef). • Alternative publishing models - Academia, Preprint Servers, disintermediation.

  5. Distributed Information Environment • We live in a world of multiple, heterogeneous information repositories, resources, portals, and IR systems. • OPACs – local, regional, national shared bibliographic databases. • Local and remote A & I Services. • Discrete publisher and vendor repositories (full-text). • Web search engines, vertical portals, custom portals (NSDL, ARL Portal). • Local metadata, digital objects, GIS, finding aids. • Preprint servers and institutional repositories (D-Space). • Instructional (course) management systems (WebCT, Blackboard). • Harvestable (OAI) sites and services.

  6. Distributed Repository - Issues • Integration of discrete, heterogeneous information resources. • Role of federated and broadcast searching of distributed resources. • Integration of collections with reference, instructional and navigation services -TOC, remote reference assistance. • Integration of Library, institutional, vendor, publisher, and government portals and information services. • Linking technologies. • Metadata harvesting, archiving.

  7. Distributed Environment Action Plan • Pressing need for document representation, retrieval, transmission, and linking middleware tools and standards. • Metadata standards, DOIs, OpenURL. • Factor: changing landscape of Scholarly Communication and disintermediation of publishers and libraries. • Federated search and simultaneous search with reference linking as mechanism to integrate DL landscape.

  8. Web Client Portal Functions: --Authorization --Linking mechanisms between resources and among resources. --Simultaneous search. --Navigation Linking: --Between full-text using DOI, CrossRef, Appropriate Copy. --Between A&I and full-text. --Between OPAC and full-text. Portal Presentation Level Local Link Server, Local Value-Added CrossRef Metadata DOI Server A& I Services (Local and Remote) Full-Text Resources Local Databases and OAI Resources via DBMS Web Resources & Knowledge Environments OPAC E-Resource Registry Aggregator (Ebsco, OCLC) Publisher Portal (Elsevier)

  9. Document Representation • Continuum of Web-Enabled technologies -- all presently being utilized. • Evolving technologies and standards. • Role and history of markup. • XML: its role and importance. • The Smart Document.

  10. Digital Library Tools • We have at our disposal the tools to create integrated digital libraries from the distributed digital resources environment in which we operate: • Standard retrieval environment (Web) and interface/client (Web Browser); • Standard transport mechanisms to connect heterogeneous content (HTTP, OAI, SOAP); • Standard metalanguages and tools for describing and transforming content and metadata (XML, DTDs & Schemas, XSLT, DC/DCQ, RDF, METS); • Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50, Object Oriented Databases); • Standard linking tools and infrastructure (DOI, OpenURL, CrossRef). • Candidate set of ‘best practices’ for IR.

  11. Work by Illinois DLI Group • We are attempting to address many of these issues within the Digital Library Initiatives group. • Headquartered at Grainger Engineering Library Information Center at UIUC. • Grant Work: • Digital Library Initiative I (NSF, others), 1994-1998. • Corporation for National Research Initiatives (CNRI) D-Lib Test Suite, 1998-2001. • Collaborating Partners Program, 1998--. • Andrew Mellon Foundation OAI Harvesting grant, 2001-2002. • NSF NSDL (National Science, Engineering, Technology, and Mathematics Digital Library) Program, 2002-2004. • Institute of Museum and Library Services (IMLS) Registry and Integration grant, 2002-2005.

  12. Illinois Testbed Project • Funded under DLI-I by NSF, DARPA, and NASA, 1994--1998. Awards made to 6 universities. • Large-scale Testbed, Distributed Repository models, evaluation, Web software. • Funded under CNRI D-Lib Test Suite Program, 1998—2001. • Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier. • All XML Journal -- AIP, APS, ACM.

  13. Illinois Full-Text Testbed • American Institute of Physics--APL, JAP, RSI • 19,000+ articles, 1995--. • American Physical Society--PRL • 15,000+ articles, 1995--, weekly updates. • ASCE Journals (25 titles) • 11,000+ articles, 1995--. • IEE Proceedings and Electronics Letters • 9,500+ articles, 1993--. • IEEE Computer Society. • ASM (American Society for Materials) Handbook. • ACM (Association for Computing Machinery) Transactions. • Elsevier Science.

  14. Accomplishments • Process & retrieve from multiple publishers & heterogeneous DTDs. • SGML to XML Conversion. • Development of a metadata specification that uses RDF, Dublin Core (DCQ and XML) XML Schemas, local Namespace. • Cross-repository searching (Testbed & D-LIB Test Suite). Full-Text and Metadata. • XSLT, CSS, for transformation & rendering, including Mathematics.

  15. Accomplishments (2) • Introduction of numerous technologies now deployed within publisher repositories: • Forward and Backward links in bibliographies -- within Testbed/Repository, from/to A & I Services. • Use of XSLT for transforming XML to HTML. • Rich extended abstracts. • Conversion of ISO 12083 math markup to MathML. CSS/DHTML mathematics rendering. Use of plug-ins. • Enhanced Web retrieval mechanisms: Author Word Wheels, Co-Occurrence Matrices. • Local Link Server for DOIs, Context-Sensitive linking.

  16. XML (eXtensible Markup Language) • Like SGML, a Data Description Metalanguage. • XML a subset/version of SGML. • Document representation and interchange Standard. • Allows fine-granularity markup of content and structure. Author can create their own elements (extensible). • Tags define the structure of document not the presentation format. • Validated vs. “well-formed” - separation of authoring process from representation & presentation. • Either validated in DTD/Schema or well-formed. • Integrated with relational DBs.

  17. XML Features • The milestones in document description and transmission: ASCII, TCP/IP, HTTP and HTML, XML. Web Programmability. • DTD not required with XML. Needed if internal entities. • Use of Document Object Model (DOM). • Technology approach from Web developer’s standpoint: XML data, CSS presentation layer, XSLT to transform the structure (‘view’) of the data/document.

  18. XML in Information Technologies • Used in Open Archives Initiative (OAI), NSDL. • Compatible with MS SQL Server, Tamino (Software AG), Oracle, DLXS/XPAT (University of Michigan/OpenText), others. • Integral to Web Services (WSDL) and SOAP – Google Web Service. • Used in Library of Congress MODS and METS metadata technologies. • Baked into XyVision and publishing packages.

  19. XML, XSLT, and CSS • Use XML full-text articles as ordered hierarchy of content objects. • Generate item-level metadata in XML, using RDF and Dublin Core syntax and semantics. • XSLT and CSS used to present metadata and articles in either XML or HTML format depending on Browser. • Mathematics rendering using MathML tools (conversion from ISO 12083 to MathML). • Real-time transformation between XML and HTML using XSLT.

  20. Schemas vs. DTDs • Both are systems of representing a data model that defines the data’s elements and attributes, and the relationship among elements. • Schema addresses limitations of DTDs and the increasingly data-oriented role of XML. • W3C XML Schema Working Group: two documents: XML structures and datatypes.

  21. Schema Justification • Description of document type’s structure should be in an XML document instead of written in special syntax (DTD). • Schema are in XML: easier to edit and process using standard XML DOM manipulation tools. • DTD notation doesn’t allow schema designers the power to impose strong data typing -- for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices.

  22. Metadata and Linking Standards • Digital Object Identifier (DOI) and Persistent Object Identifiers. • OpenURL and Value-Added Service Components (SFX). • Open Archives Initiative (OAI), Dublin Core and Qualifiers, RDF. • Local Resolver Servers.

  23. Open Archives Initiative (OAI) • Released version 1.0 of metadata harvesting protocols. Frozen through second quarter 2001. • Mechanism for data providers to expose their metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories. • Roots in e-print archives. • Lightweight, low-barrier. Easy to implement Web server to handle OAI protocol requests; need to develop procedures to access and extract your metadata.

  24. Ongoing Investigations • Relationship between interoperability models for search and discovery: federated searching (OAI harvested) and broadcast, simultaneous searching of distributed repositories. Not mutually exclusive. • OAI Provider and Harvesting software. Encoding Archival Description (EAD). OAI Engineering/CS/Physics site. • Role of HTTP harvesting, Spider technology. • Reference Linking integration built on OpenURL and DOI. • Reference Assistant software with simultaneous search, point-of-contact assistance, and remote reference capability.

  25. Portals and Gateways • Role is to bring together and integrate disparate e-resources. • Provide a systematic ‘view’ of the information landscape, particularly full-text. • Two primary foci: robust search/navigation and the ability to link everywhere from anywhere in the environment of OPACs, A & I Services, full-text. • Central to this implementation is federated and simultaneous search and reference linking technologies.

  26. Digital Object Identifier (DOI) • DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. Persistent object identifier. • ‘The ISBN for the 21st Century’ -- Norman Paskin. • DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database. • Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.

  27. DOI Construction • First real open standard for content identification. • DOI is a number that identifies a digital object: • 10.1063/S000369519903216 • 10 Registration Agency Prefix • 1063 Publisher Prefix • S000369519903216 Suffix (Publisher-assigned ID) • Suffix can be SICI or PII. • The DOI and URL pointing to the digital object, is registered with the International DOI Foundation, e.g: • 10.1063/333 | http://www.pubsite.org/apr99/artl1.pdf

  28. Using a DOI • DOIs are resolved using the Handle System technology from CNRI (Corporation for National research Initiatives). • Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g: • dx.doi.org/10.1063/333 redirects to www.pubsite.org/apr99/artl1.pdf

  29. Reference Linking • CrossRef Publisher system: major Sci-Tech professional societies and commercial publishers. • System design calls for one URL for each DOI; underlying technology can handle multiple URLs however. • Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator). Appropriate Copy problem.

  30. DOI Proxy OpenURL Client (Web Browser) AIP Handle Server dx.doi.org/10.1063/1234 IEE Nosfx=y Cookie on client Aware Elsevier Local AIP, IEE OpenURL Local Value Added Illinois Local Link Server DOI CrossRef Metadata Database Metadata UIUC Metadata Registry

  31. Simultaneous Search Implementations • DialIndex from Dialog. • Ex Libris MetaLib service. • Endeavor EnCompass. • Innovative Interfaces MetaFind. • Ovid Multiple Search and reference De-Duping. • ISI Web of Knowledge. • Gale Corporation InfoTrac Total Access. • WebFeat. • California Digital Library SearchLight system. • Los Alamos FlashPoint system. • Fretwell-Downing partnering with ARL Portal and Monash University.

  32. Grainger Search Aid • Assist users in the selection of appropriate databases . • Normalize user search arguments and display search results from candidate databases. • Cross-database asynchronous concurrent searching. • Article level and e-journal Web site access to publisher full-text repositories. • Utilize OpenURL, CrossRef metadata database and DOI for reference linking at the article level. • Proxying of vendor systems and capability of ‘taking over’ the search in vendor native mode.

  33. Grainger Search Aid

  34. Reference Assistant Project • Utilize Search Aid simultaneous search and link capabilities. • Opportunity to explore interface and navigation issues. • Mimics the behavior of reference librarian. • Allows the application of ‘best match’ and ‘quorum searching’ algorithms.

  35. Reference Assistant Top Menu

  36. Simultaneous Search Implementations • Shared Blackboard approach employing Independent Searchbots dedicated to searching information resources and passing results to Web clients. • Event-Driven, Asynchronous HTTP Queries from within a Single Script returning results to Web browser.

  37. Event-Driven, Asynchronous Queries • Single, event-driven web server process, asynchronously querying multiple resources. • Uses WinHTTP from ASP and VBScript • Simpler, not as flexible. Search algorithms and processing coded in scripts. • This is the approach we currently use for our service. • Implementation of multi-step login and session variable passthru being investigated.

  38. OpenURL-Based Services • Standard for expressing and transmitting metadata. • Promise of standardized, normalized search results. • Provides value-added links to the Ovid search results. • Using CrossRef metadata database to look up DOIs.

  39. CiteParse.dll • An ActiveX DLL which can parse various Ovid citations and turn them into OpenURLs: • Tansu N. Chang YL. Takeuchi T. Bour DP. Corzine SW. Tan MRT. Mawst LJ. Temperature analysis … quantum-well lasers. [Article] IEEE Journal of Quantum Electronics. 38(6):640-651, 2002 Jun. • http://…/resolver.asp?genre=article&aulast=Tansu&auinit1=N&atitle=Temperature+analysis+…+quantum-well+lasers&title=IEEE+Journal+of+Quantum+Electronics&volume=38&issue=6&spage=640&epage=651&pages=640-651&date=2002-06

  40. Conclusions • User reactions very positive. • The one-stop-shopping approach has been successful. • Users consider ability to link to full-text from citations in A & I Services and from references on publisher portals very helpful. • Technically, best approach appears to be a hybrid of asynchronous client interface with Web Services querying databases. Moves database middleware to Web Services and eliminates extensive custom script code for search and database query.

  41. Publishing Trends • Publishers will continue to add value to online journal articles. • Digital version will become version of record. • Virtual journals (both publisher-based and cross-publisher) will become common. • Next-generation knowledge environments will evolve. Multimedia, data exposed, live equations with in-place calculations.

  42. Publishing Trends (Continued) • Personalized services will be available -- agent technology, alerting services. • Different economic and subscription models will be introduced. • Deconstruction of Journal (Bob Kelly, APS); article at a time publishing. • Journal branding or perhaps publisher branding. • Academia issues: publishing, tenure.

  43. Continuing Issues • Role of Authors, Academic Institutions, Libraries, Publishers, Abstracting & Indexing Services. • Disintermediation may affect both Libraries and Publishers. • Information as Function not Place. • Provide a ‘Digital Library’ out of digital collections. • Role of XML technology. • Service mechanisms: processing & archiving, search and discovery, presentation, linking.

More Related