Digital library technologies at the grainger library
This presentation is the property of its rightful owner.
Sponsored Links
1 / 59

Digital Library Technologies at the Grainger Library PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

Digital Library Technologies at the Grainger Library. William H. Mischo, Timothy W. Cole, Tom Habing [email protected] Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign National Digital Archives Project Office of Taiwan March 25, 2002. Outline.

Download Presentation

Digital Library Technologies at the Grainger Library

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Digital library technologies at the grainger library

Digital Library Technologies at the Grainger Library

William H. Mischo, Timothy W. Cole, Tom Habing

[email protected]

Grainger Engineering Library Information Center

University of Illinois at Urbana-Champaign

National Digital Archives Project Office of Taiwan

March 25, 2002


Outline

Outline

  • IR Tools and Full-Text

  • Distributed Information Environment.

  • Illinois Projects.

  • XML Technologies.

  • Metadata Technologies.

  • DOIs, Linking, Local Resolver

  • OAI

  • Portals, Simultaneous Search, Linking

  • Issues & Trends.


Overview

Overview

  • We now have the tools to pursue the grand challenges of Information retrieval:

    • standard retrieval environment (Web) and interface/client (Web Browser).

    • Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50).

    • Standard language for describing and transforming content and metadata (XML, XSLT, DC, DCQ, RDF, Schemas).

    • Standard transport mechanisms to connect heterogeneous content (HTTP, SOAP, OAI).

  • Candidate set of ‘best practices’ for IR.


The digital library

The Digital Library

  • ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time.

  • Tendency to apply term to collections and resources.

  • Digital Collections vs. Digital Library.

  • Emphasis on the integration of collections and services (NSDL).

  • Application of standards and protocols is important.


Full text technologies

Full-Text Technologies

  • Continuum of Web-Enabled technologies -- all presently being utilized.

  • Evolving technologies and standards.

  • Role and history of markup.

  • XML: its role and importance.

  • The Smart Document.


Scholarly communication overview

Scholarly Communication Overview

  • E-Resources are Web-based and publisher-centric.

  • Growth of Heterogeneous Distributed Repositories.

  • Value-added services and ‘branding’ of journals.

  • Prestige of Journals and Publishers

  • Reciprocal linking relationships between publishers.

  • Cooperation on linking standards (DOI, CrossRef).

  • Alternative publishing models - Academia, Preprint Servers, disintermediation.


Distributed information model

Distributed Information Model

  • Diverse information environment in which we operate.

  • Multiple elements, relationships and nodes.

  • Need for gateway, interface, and navigation tools.

  • Need for document representation, transmission, linking, and retrieval middleware tools and standards.

  • Role of A & I Services.


Distributed repository issues

Distributed Repository Issues

  • Integration of discrete publisher repositories, locally loaded full-text, local and remote A & I services, OPAC, Web resources, and local data.

  • Issues for user access:

    • need to identify appropriate publisher repository, but presently interfaces are different and full-text and controlled vocabulary searching often not offered.

    • A & Is: not full-text but offer controlled vocabulary, no links to full-text repositories.


Distributed repository needs

Distributed Repository - Needs

  • Integration of discrete publisher repositories, locally loaded full-text, local and remote A & I services, OPAC, Web resources, and local data.

  • Support simultaneous searching of A & I Services, Distributed Repositories, OPACs, Web search engines, local files. Integrate TOC, full-text.

  • Remote Reference 24 X 7.

  • Metadata harvesting, archiving.

  • Local Resolver services for locally loaded or Aggregator Resources.


Illinois testbed project

Illinois Testbed Project

  • Funded under DLI-I by NSF, DARPA, and NASA, 1994--1998. Awards made to 6 universities.

  • Large-scale Testbed, Distributed Repository models, evaluation, Web software.

  • Funded under CNRI D-Lib Test Suite Program, 1998—2001.

  • Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier.

  • All XML Journal -- AIP, APS, ACM.


Illinois testbed

Illinois Testbed

  • American Institute of Physics--APL, JAP, RSI

    • 18,000+ articles, 1995--.

  • American Physical Society--PRL

    • 14,000+ articles, 1995--, weekly updates.

  • ASCE Journals (25 titles)

    • 10,000+ articles, 1995--.

  • IEE Proceedings and Electronics Letters

    • 8,500+ articles, 1993--.

  • IEEE Computer Society.

  • ASM (American Society for Materials) Handbook.

  • ACM (Association for Computing Machinery) Transactions.

  • Elsevier Science.


Project issues

Project Issues

  • Evolution of the Document.

  • Distributed information environment.

  • Use of Metalanguages & Transformations (SGML, XML).

  • Searching over full-text of journals vs. document surrogates in A & I format.

  • Rendering and styling (SGML, XML, MathML).

  • Dynamic metadata for normalization, linking.

  • Breadth and depth of collections.

  • User needs.


Accomplishments

Accomplishments

  • Process & retrieve from multiple publishers & heterogeneous DTDs.

  • Metadata specification that uses RDF, Dublin Core (DCQ, DC Agents) Schemas, IDLI Namespace.

  • Cross-repository searching (Testbed & D-LIB Test Suite). Full-Text and Metadata.

  • SGML to XML Conversion.

  • XSLT, CSS, for transformation & rendering, including Mathematics.


Accomplishments 2

Accomplishments (2)

  • Linking: Forward/Backward within Testbed, from/to A & I Services.

  • Conversion of ISO 12083 math markup to MathML.

  • Enhanced Web retrieval mechanisms: Author Word Wheels, Co-Occurrence Matrices.

  • Detailed user transaction logs, gathered at the search argument level, with identification of characteristics of each user search sessions

  • Local Link Server for DOIs, Context-Sensitive linking.


Accomplishments 3

Accomplishments (3)

  • CSS/DHTML Math rendering techniques, TechExplorer integration. Two international math conferences.

  • Simultaneous search within DeLiver of Tesbed repositories, A & Is, NCSTRL.

  • Local Link Server and Appropriate Copy Issues.

  • Simultaneous search of A & Is, OPAC, Google, Local resources with integrated reference linking using OpenURL and DOIs from A & Is.

  • Open Archives Initiative (OAI).


Ongoing investigations 1

Ongoing Investigations (1)

  • Support simultaneous searching of A & I Services, Distributed Repositories, enhanced navigation, expanded gateway functions.

  • Interoperability models, e.g., Metadata harvesting vs. Federated (Broadcast).

  • OAI Provider and Harvesting software. OAI EAD and Cultural Heritage collection and retrieval system.

  • HTTP harvesting, Spider technology (gathering).


Ongoing investigations 2

Ongoing Investigations (2)

  • Archiving.

  • Local Link Server with context-sensitive resources.

  • Reference Linking integration built on OpenURL and DOI.

  • NSDL presence.

  • Reference Assistant software with simultaneous search, point-of-contact assistance, and remote reference capability..


Xml extensible markup language

XML (eXtensible Markup Language)

  • Like SGML, a Data Description Language (Metalanguage).

  • Subset/version of SGML.

  • Allows fine-granularity markup of content and structure. Author can create their own elements (extensible).

  • Tags define the structure of document not presentation format.

  • Validated vs. “well-formed” - separation of authoring process from representation & presentation.

  • Either validated in DTD/Schema or well-formed.

  • Compatible with relational DBs.


Xml and publishers

XML and Publishers

  • Seybold Seminars Publishing 2000, Boston, February 2000.

  • Tim Gill of Quark, “…the use of XML could lead to a drop in the cost of Web publishing by 30% to 50% and a significant reduction in the time it takes to produce sites.”

  • Gill: “I don’t believe that there is any innovation in print that is going to save us even 10% in costs.”

  • Issues and Challenges remain.

  • Publishers are looking at the all-XML journal.


Xml features

XML Features

  • The milestones in document description and transmission: ASCII, TCP/IP, HTTP and HTML, XML. Web Programmability.

  • DTD not required with XML. Needed if internal entities.

  • Use of Document Object Model (DOM).

  • Technology approach from Web developer’s standpoint: XML data, CSS presentation layer, XSLT to transform the structure (‘view’) of the data/document.


Role of xml

Role of XML

  • “If you ask 20 people in the industry, ‘what is XML?’ You’ll get 20 different answers – Dale Fuller, CEO, Inprise Corporation.

  • Vendor-Neutral, platform-independent structured information standard.

  • Document representation and interchange Standard.

  • Applications can externalize their data/metadata as XML.

  • Issues with full-text representation: PDF, XML/HTML. Value in indexing, retrieval.


Xml parser apis tree based and event based

XML Parser APIs: Tree-Based and Event-Based

  • DOM (Document Object Model).

    • DOM Level 1 and Level 2 W3C recommendation. Widely implemented, Tree-Based. Hierarchy of nodes. Loads entire document into memory. Level 2 adds namespace support, traversal, stylesheets, events, triggers. Level 3 working draft. DOM HTML candidate. Parsers allow developers to iterate through documents, change document content.

  • SAX (Simple API for XML).

    • Open-source, XML-DEV, not W3C. Event-based, fires events as it reads document, need not load entire document into memory. Good for single-pass processing. Xerces, XML4C, Sun Project X (Crimson).


Xml linking

XML Linking

  • XML Base http://www.w3.org/TR/xmlbase

    • Permits use of relative URI path prefixes. Can then shorten references.

  • XLink http://www.w3.org/TR/xlink/

    • Method for specifying navigational links. Allows enforcement of specific path order through links. xlink:type=“simple” corresponds to HTML <a> or <img> tags.

  • XInclude http://www.w3.org/TR/xinclude

    • Copies entire XML documents or selected portions into current document. Candidate recommendation. Uses XPath and XPointer to specify document elements to include.

  • XPointer http://www.w3.org/TR/xptr

    • Uses XPath to identify portion of a document. Permits string searches and range specifiers.


Xml schema and structure

XML Schema and Structure

  • DTD

    • Original schema representation, defines structural rules for a class of XML documents.

  • XML Schema http://www.w3.org/XML/Schema

    • Also sets out standardized structure for class of XML documents. Is coded in XML, can be parsed and edited with standard software. Two separate parts: structures and datatypes.

  • Namespaces http://www.w3.org/TR/REC-xml-names/

    • Allows developers to qualify element and attribute names with unique URIs, avoids recognition errors.


Xml implementations

XML Implementations

  • XHTML, SVG (Structured Vector Graphics), XForms (similar to HTML forms).

  • MathML http://www.w3.org/Math/

    • Markup language for describing mathematics, both presentation and content.

  • RDF http://www.w3.org/RDF/

    • Resource Description Framework. Defines structure for encoding object metadata. Facilitates metadata interchange & harvesting. RDF Schemas.

  • Others: DocBook, XML ISO12083, Open eBook, WAP/WML.


Searching and transformation

Searching and Transformation

  • XPath http://www.w3.org/TR/xpath

    • Defines pattern-matching syntax used by XSLT and XPointer. Method for selecting data in a document. MSXML 3.0 supports XPath. Supercedes XPatterns./descendant-or-self::node()/child::name

  • XSL

    • Includes transformative and FO formatting objects. FO will replace CSS for document formatting.

  • XSLT http://www.w3.org/TR/xslt

    • Mechanism for encoding style rules, ensures consistent rendering of XML documents of the same type.

  • XML Query http://www.w3.org/XML/Query

    • Response to limitations of XPath. Would bring database-style queries to XML documents.


Remote object access

Remote Object Access

  • SOAP (Simple Object Access Protocol)

    • Microsoft, IBM, Sun. Allows applications to invoke objects or functions residing on remote servers. Creates request block in XML.

  • XML-RPC http://www.xmlrpc.com/

    • Remote procedure calling using HTTP as the transport and XML as the encoding. Open, but not standard protocol; widely adopted.

  • Web Services.


Remote object access1

Remote Object Access

  • Web Services:

    • Based on XML, SOAP, UDDI (Universal Description, Discovery, and Integration), and WSDL (Web Services Description Language). Applications are assembled on the fly in XML, exposed to the world, and accessed via the Web from different devices.

    • Supported by Microsoft .net, IBM WebSphere, SUN ONE.


Xml xslt and css

XML, XSLT, and CSS

  • Use XML full-text articles as ordered hierarchy of content objects.

  • Generate item-level metadata in XML, using RDF and Dublin Core syntax and semantics.

  • XSLT and CSS used to present metadata and articles in either XML or HTML format depending on Browser.

  • Mathematics rendering using MathML tools (conversion from ISO 12083 to MathML).

  • Real-time transformation between XML and HTML using XSLT (scalability issues).


Xslt where should it happen

XSLT Where Should It Happen

  • Client-side

    • IE5+ only

      • Not Netscape 6 or Mozilla (yet)

      • IE5 not yet fully compliant w/ XSLT and XPath standard

    • Can reduce the load on your servers

    • But performance on low-end clients can be BAD

  • Server-side

    • Performance could be a problem on busy servers, serving large, complex documents

    • More control & flexibility over the conversion (metamerge)

  • Offline Preconversion

    • Best performance

    • Not best for dynamic documents (metamerge)


Converting xml to html xslt

Converting XML to HTML (XSLT)

  • Simple one-to-one conversions:<sect> becomes <span class="sect">

    • span.sect {display:block;margin-left:2em}

  • Attribute based conversions:<emph type="1"> becomes <span class="emph_1">

    • span.emph_1 {font-style:italic}

  • Generated text, such as punctuation:<ag><au>Tom</au><au>Tim</au><au>Bob</au></ag>becomesTom, Tim, Bob.

  • Rearranged children:<au><sn>Habing</sn><fn>Tom</fn></au>becomesTom Habing


Converting xml to html cont

Converting XML to HTML (cont.)

  • Some elements are converted into HTML elements other than <span> or <div>

    • Figures are converted to <img src="…"> tags.

    • Internal links with ID and IDREF attributes are usually converted into HTML anchor tags.

    • Table elements are converted into corresponding HTML <table>, <tr>, or <td> tags.

  • ‘Real’ DTDs require some fairly complex processing.

    • So far XSLT seems to be able to handle nearly every case we have come across

    • However, some cases have required JScript extensions to XSLT


Schemas vs dtds

Schemas vs. DTDs

  • Both are systems of representing a data model that defines the data’s elements and attributes, and the relationship among elements.

  • Schema addresses limitations of DTDs and the increasingly data-oriented role of XML.

  • Initial Arbortext, DataChannel, Inso, Microsoft, and Univ of Edinburgh proposal: XML-Data.

  • W3C XML Schema Working Group: two documents: XML structures and datatypes.


Schema justification

Schema Justification

  • Description of document type’s structure should be in an XML document instead of written in special syntax (DTD).

  • Schema are in XML: easier to edit and process using standard XML DOM manipulation tools.

  • DTD notation doesn’t allow schema designers the power to impose strong data typing -- for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices.


Metadata and linking standards

Metadata and Linking Standards

  • Digital Object Identifier (DOI) and Persistent Object Identifiers.

  • OpenURL and Value-Added Service Components (SFX).

  • Open Archives Initiative (OAI), Dublin Core and Qualifiers.

  • Local Resolver Servers.


Metadata in dli

Metadata in DLI

  • To normalize & augment presentation.

  • To normalize searching (e.g. Names).

  • To store dynamic links.

  • Types of links:

    • Articles referenced By item (Backward).

    • Articles that reference the item (Forward).

    • A & I Records for references and items.

    • Other relationships (TOC, Other items by Author, Collaborative Data).

    • Known item and presumptive linking.


Dli metadata schema

DLI Metadata Schema

  • Maintained as XML files using RDF and Qualified Dublin Core syntax and semantics.

  • Example:

    <dcq:issued> <!-- subproperty/refinement of DC Date --> <dcq:W3CDTF> <!-- DC Date encoding --> <rdf:value>1999-09</rdf:value> </dcq:W3CDTF> </dcq:issued>

  • Application of XML DOM for processing at DC or idli level.


New dli metadata schema

New DLI Metadata Schema

<dc:creator>

<rdf:Seq>

<rdf:li>

<dca:Person rdf:ID="AUTHOR-1">

<dca:agentname>

<dca:FNF>

<rdf:value>L'Ecuyer, Pierre</rdf:value>

</dca:FNF>

</dca:agentname>

<dca:agentaffiliation>Université de Montréal Département...</dca:agentaffiliation>

<dca:agentidentifier rdf:resource="mailto:[email protected]" />

</dca:Person>

</rdf:li>

…..

</rdf:Seq>

</dc:creator>


Digital object identifier doi

Digital Object Identifier (DOI)

  • DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. Persistent object identifier.

  • ‘The ISBN for the 21st Century’ -- Norman Paskin.

  • DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database.

  • Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.


Doi construction

DOI Construction

  • First real open standard for content identification.

  • DOI is a number that identifies a digital object:

    • 10.1063/S000369519903216

      • 10 Registration Agency Prefix

      • 1063Publisher Prefix

      • S000369519903216 Suffix (Publisher-assigned ID)

  • Suffix can be SICI or PII.

  • The DOI and URL pointing to the digital object, is registered with the International DOI Foundation, e.g:

    • 10.1063/333 | http://www.pubsite.org/apr99/artl1.pdf


Using a doi

Using a DOI

  • DOIs are resolved using the Handle System technology from CNRI (Corporation for National research Initiatives).

  • Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g:

    • dx.doi.org/10.1063/333 redirects to www.pubsite.org/apr99/artl1.pdf


Reference linking

Reference Linking

  • Alternatives to DOI:

    • PubMed/PubRef (National Library of Medicine)

    • PubSCIENCE (DOE/OSTI)

    • Proprietary Link Managers (AIP, APS)

  • CrossRef Project: major Sci-Tech professional societies and commercial publishers.

  • System design calls for one URL for each DOI; underlying technology can handle multiple URLs however.


Local resolver

Local Resolver

  • Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator).

  • Harvard problem, Appropriate Copy problem.

  • Additional desire to direct users to local value-added services: local print holdings, interlibrary borrowing, other articles in A & I Services.


Local resolver1

Local Resolver

  • Local Resolver Servers

    • OpenURL Protocol, CookiePusher vs. IP Addresses.

  • Demonstration Project at Illinois, OhioLink (Ex Libris SFX), Los Alamos.

    • Localizing Name Resolution for AIP, ASCE, Elsevier, other publishers.

    • Use of CrossRef Metadata Database for identifying Publisher from DOI and linking to Local Copy, A & I Services, Library Assistance.


Digital library technologies at the grainger library

DOI Proxy

OpenURL

Client

(Web Browser)

AIP

Handle

Server

dx.doi.org/10.1063/1234

IEE

Nosfx=y

Cookie on client

Aware

Elsevier

Local

AIP, IEE

OpenURL

Local

Value

Added

Illinois Local

Link Server

DOI

CrossRef

Metadata

Database

Metadata

UIUC Metadata

Registry


Grainger search aid

Grainger Search Aid

  • Development of Portal and Gateway sites featuring:

    • robust search/navigation;

    • ability to link everywhere from anywhere.

  • Simultaneous search of heterogeneous resources to assist in database selection.

  • Article level and e-journal Web site access to full-text repositories.

  • Utilize OpenURL and DOI.


Open archives initiative oai

Open Archives Initiative (OAI)

  • Released version 1.0 of metadata harvesting protocols. Frozen through second quarter 2001.

  • Mechanism for data providers to expose their metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories.

  • Roots in e-print archives.

  • Lightweight, low-barrier. Easy to implement Web server to handle OAI protocol requests; need to develop procedures to access and extract your metadata.


Oai continued

OAI Continued

  • Requires repositories to support the Dublin Core elements.

  • Allows communities to expose metadata in other formats as long as records are structured as XML data with corresponding XML schema.

  • Registration mechanism provides publicly accessible list of OAI conformants.

  • Alpha testing phase completed.


Publishing trends

Publishing Trends

  • Publishers will continue to add value to online journal articles.

  • Digital version will become version of record.

  • Virtual journals (both publisher-based and cross-publisher) will become common.

  • Next-generation knowledge environments will evolve. Multimedia, data exposed, live equations with in-place calculations.


Publishing trends continued

Publishing Trends (Continued)

  • Personalized services will be available -- agent technology, alerting services.

  • Different economic and subscription models will be introduced.

  • Deconstruction of Journal (Bob Kelly, APS); article at a time publishing.

  • Journal branding or perhaps publisher branding.

  • Academia issues: publishing, tenure.


Closing issues

Closing Issues

  • Role of Authors, Academic Institutions, Libraries, Publishers, Abstracting & Indexing Services.

  • Disintermediation may affect both Libraries and Publishers.

  • Information as Function not Place.

  • Provide a ‘Digital Library’ out of digital collections.

  • Role of XML technology.

  • Service mechanisms: processing & archiving, search and discovery, presentation, linking.


  • Login