1 / 67

Part 4

Part 4. Next Generation Digital Libraries: Supporting Interoperability, Semantics, and Quality. OAI, ODL, DL-in-a-box. Open Archives Initiative since 1999, www.openarchives.org Open Digital Libraries since 2001, from www.dlib.vt.edu with Hussein Suleman (now U. Cape Town) DL-in-a-box

chi
Download Presentation

Part 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part 4 Next Generation Digital Libraries: Supporting Interoperability, Semantics, and Quality

  2. OAI, ODL, DL-in-a-box • Open Archives Initiative • since 1999, www.openarchives.org • Open Digital Libraries • since 2001, from www.dlib.vt.edu • with Hussein Suleman (now U. Cape Town) • DL-in-a-box • NSDL support since 2001 • Aimed to help new collections / services projects • http://dlbox.nudl.org

  3. Open Archives Initiative (OAI) • Advocacy for interoperability • Standard for transferring metadata among digital libraries • Protocol for Metadata Harvesting (PMH) • Simplicity • Generality • Extensibility • Support for PMH => Open Archive (OA)

  4. OAI = Technical Umbrella forPractical Interoperability… Metadata Harvesting Reference Libraries Museums Publishers E-PrintArchives …that can be exploited by different communities

  5. OAI – Repository Perspective Required: Protocol Set Structure URI Scheme MDO MDO MDO MDO Required: DC MDO MDO MDO MDO DO DO DO DO

  6. OA 1 OA 2 OA 4 OA 3 OA 5 OA 6 OA 7 OAI – Black Box Perspective

  7. Tiered Model of Interoperability Mediator services Metadata harvesting Document models

  8. Metadata harvesting The World According to OAI Service Providers Discovery Current Awareness Preservation Data Providers

  9. Image Video Video Video Image Image Program Program Program 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Document Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? users digital objects

  10. Program Video Image Image Video Program Program Video Image 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Document Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Monolithic and/or Custom-built web-based application ? ? digital library

  11. Program Video Video Image Image Program Program Video Image ? 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? ? ? ? ? ? ? ? ? ? ? ? Document ? Document Document ? 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ? 1010100101010010101010010101010101010101 ? ? ? ? ? ? componentized digital library

  12. Program Video Video Image Image Program Program Video Image XPMH 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 OA OA XPMH PMH OA XPMH OA XPMH XPMH OA XPMH OA Document Document Document XPMH XPMH 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 XPMH OA OA XPMH OA PMH XPMH open digital library

  13. Extended OAI-PMH Open Digital Library Protocol Protocol for Metadata Harvesting

  14. Extended OPEN ARCHIVE Open Digital Library Component OPEN ARCHIVE

  15. Open Digital Library Deployments • NDLTD (www.ndltd.org) • Computer Science Teaching Center (www.cstc.org) • Computing and Information Technology Interactive Digital Educational Library (www.citidel.org) • Open Archives Distributed (NSF, DFG) – enhancements to PhysNet • OCKHAM • Open to others through DL-in-a-box

  16. Open Digital Library • Network of Extended Open Archives where each node acts as either a provider of data, services or both. • Component = Node • Protocol = Arc

  17. Open Digital Library Components • Running now • XML-File (data provider from file system) • Search: simple or in-memory (Essex) or generalized • Union, browse, recent, filter • E-journal/review, Submit, Edit, Annotation • Recommender, Rating; Mirroring (see JCDL’02) • Working with NCSA: from DB, unstructured text • Others in process • Classification/categorization • Registry (and other connections with web services)

  18. ETD-2 ETD-4 Video ETD-3 Image Program Program Video Image 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 ETD-1 Document Document 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 1010100101010010101010010101010101010101 Example Open Digital Library ODLRecent USER INTERFACE Recent PMH ODLUnion Filter PMH ODLUnion Union Browse PMH ODLBrowse PMH ODLUnion Filter PMH Search ODLSearch ETD DL for the Networked Digital Library of Theses and Dissertations (www.ndltd.org) Students and researchers ETD collections

  19. Open Digital Library: Extended As What’s New Service Provider As Metadata Search Service Provider As Metadata Browse Service Provider As Recommend & Rate Service Provider As Annotation Search Service Provider IRDB-1 Search Engine DBBrowse Browse Engine Recommend IRDB-2 Search Engine What’s New Engine Rate Engine XMLFile Coll. & Data Provider 1 DBUnion Archive Merger Component Annotation Engine Harvest from data providers XMLFile Coll. & Data Provider 2 Filter XMLFile Coll. & Data Provider 3 OAI-PMH Data Provider Submit Archive OAIB (NCSA: from RDBMS)

  20. CITIDEL Technology Features • Component architecture (Open Digital Library) • Re-use and compose re-deployable digital library components. • Built Using Open Standards & Technologies • OAI: Used to collect DL Resources and DL Interoperability • XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …) • Perl: Component Integration • ESSEX: Search Engine Functionality • Very fast, utilizing in-memory processing • Includes snap-shots for persistence • Multi-scheming • Integrates multiple classifications / views through maps, closure

  21. Multi-dimensional Categorization

  22. OCKHAM Initiative, Contact Info • Supported by DL Federation, Mellon, NSF, … • P2P University Network involving: • Emory, Notre Dame, U. Arizona, Virginia Tech, … • PI: Martin Halbert Phone 404-727-2204 Email: mhalber@emory.edu • OCKHAM URL: http://ockham.library.emory.edu

  23. The Problem • Digital library development is complex and expensive. • Various DL development communities (in the USA at least) are not working together well. • Results exhibit much incompatibility, little common practice, slow progress, and no leverage on investment. • If this continues, we are just going to languish and fester.

  24. Lightweight Protocols • “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive. • Successes of protocols considered lightweight is illuminating. • Examples: TCP/IP, HTTP, LDAP, and the OAI PMH

  25. Reference Models • Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration • Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement • Explored in CS6604 class project with 2 focus groups: librarians, education experts

  26. Current Focus: Peer-to-Peer (P2P) Lightweight (Protocol) Reference Models • Builds on successful example of the OAI PMH, clearly understood minimalist concept of metadata distribution, implemented in simple protocols (e.g., ODL) • Leads to developing simple reference models of specific subsystems, with associated simple protocols and standards • Testing in NSDL, connecting university libraries to support teaching & learning

  27. OCKHAM Proposed Services • Alerting • Browsing • Cataloging • Conversion • OAI – Z39.50 • Pathfinding • Registry – prototype in CS6604 now • (plus others such as from adapted ODL)

  28. DL Student Research: Gonçalves • 5S as a basis for developing digital libraries • Theory • Syntax, Semantics; Definitions, Relationships • Specification of requirements • Generation of systems • Quality

  29. Motivation for 5S • DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc. • DL construction: difficult, ad-hoc, lacking support for tailoring/customization • Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. • Lack of specific DL models, formalisms, languages

  30. 5S Layers Societies Scenarios Spaces Structures Streams

  31. 5S Model: Examples, Objectives

  32. Intra-Model Relationships: Streams • Participant concepts: {text, image, video, audio} • Relations: • contains videoimage videoaudio • Streams define the basic content types over which digital objects are built; the latter being the ultimate carriers of the information in the DL. • However some complex types of streams (e.g., video) may themselves be associated with simpler types of streams (e.g., images, audio). • This relation indicates that a video contains a image as one of its frames or a specific audio recording.

  33. DL Services/Activities Taxonomy (Gonçalves) Infrastructure Services Information Satisfaction Services Repository-Building Add Value Creational Preservational Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing

  34. Services, Definitions, Parameters • In the table each service is characterized by • parameters (input, output) • of the initial and final events • of the scenarios that compose those services and • respective pre- and post-conditions which are represented in terms of rules on DL relations. • All other previous definitions and keys apply here. • That set is complemented with the following definitions:

  35. Services Related Definitions • Aquery q is the representation of user interest or information need. • Hyptxt is an hypertext; wherein anchor is a node. • A log_entry is a descriptive metadata specification about an event of a scenario. • Let {doi} = {doi1, doi2,…, doin } be a set of digital objects and Ct = {c1, c2,…,cn} is a set of labels for categories. A classifier classCt: {doi}  2Ct is a function that maps a digital object to a set of categories. • A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.

  36. DL Services I/O Behavior • Regarding the prior figure, which shows: • Instantiations of the “Services Definition” model • Inputs and outputs of examples of infrastructure and information satisfaction DL services • Key: • CDL = Collection • ICDL = index for collection CDL • {doi} = digital object • Soc = Society

  37. DL Concept Dimensions of Quality Digital object Accessibility Pertinence (*) Preservability (*) Relevance Similarity Significance Timeliness (*) Metadata specification Accuracy Completeness Conformance Collection Completeness Impact Factor Catalog Completeness Consistency Repository Completeness Consistency Structures for Navigation Navigability (*) Services Composability Efficiency Effectiveness Extensibility Reusability Reliability Defining Quality in Digital Libraries

  38. Completeness of Metadata (1) • Degree of completeness of a metadata specification msx • Completeness(msx) = 1 - (no. of missing attributes in msx/ total attributes of the schema to which msx conforms) • According to 5S definition of conformance

More Related