1 / 51

Interoperability in Digital Libraries Open Archives Initiative and the NSDL

Interoperability in Digital Libraries Open Archives Initiative and the NSDL. CS 502 – 20020326 Carl Lagoze – Cornell University. Acknowledgements: Bill Arms Herbert Van de Sompel. Beyond the walls.

cgarry
Download Presentation

Interoperability in Digital Libraries Open Archives Initiative and the NSDL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interoperability in Digital LibrariesOpen Archives Initiative and the NSDL CS 502 – 20020326 Carl Lagoze – Cornell University Acknowledgements: Bill Arms Herbert Van de Sompel 20020307

  2. Beyond the walls The Library should selectively adopt the portal model for targeted program areas. By creating links from the Library’s Web site, this approach would make available the ever-increasing body of research materials distributed across the Internet. The Library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users, but it would not house local copies of materials or assume responsibility for long-term preservation. LC21: Digital Strategy for the Library of Congresspage 5 20020307

  3. A portal should mean more than access….. • Traditional portal (e.g., Yahoo!) • linkage with limited responsibility • Hybrid Portal • Asserting some semblance of curatorial role over linked resources • Providing a rich fabric of services across those resources 20020307

  4. Interoperability standards enable service creation…. • Search and discovery • Z39.50 • Metadata vocabularies and syntax • MARC • Dublin Core • XML/RDF • Object models • METS • FEDORA 20020307

  5. Z39.50SGML MetadataHarvesting Cost DublinCore HTTPGoogle Functionality Interoperability Trade-offs 20020307

  6. Yes, its about resource discovery over distributed collections metadata Author Title Abstract Identifer 20020307

  7. Facilitating/Monitoring Longevity of Distributed Content PreservationService 20020307

  8. View A: • View Slides • View Video • View synchronized presentation using applet • View B: • Get Transcript of Audio • Search for keyword • Get Slides translated to French Portal A Portal B Tool Repository structural metadata DigitalObject Powerpoint presentation SMIL synchronization metadata Realaudio video Personalization of Content 20020307

  9. citation metadata citation metadata citation metadata citation metadata citation metadata Cross-Repository Reference Linking Linkage Service 20020307

  10. Origins of the OAI • Increasing interest in alternative scholarly publishing solutions – e.g., LANL arXiv • Increasing impact through federation • UPS Mtg., Sante Fe, October 1999 • Representatives of various ePrint, library, publishing, communities • Goal: definition of an interoperability framework among ePrint providers • Result: Santa Fe Convention, interoperability through metadata harvesting 20020307

  11. “Open” Archives • Political Agenda? • Author self-archiving of E-Prints • “Mission” to reformulate scholarly publishing framework • Technical? • Infrastructure to facilitate interoperability across multiple domains 20020307

  12. Technical Umbrella for Practical Interoperability… Metadata Harvesting Reference Libraries Museums Publishers E-PrintArchives …that can be exploited by different communities 20020307

  13. OAI Technical Infrastructure Key technical features • Deploy now technology – 80/20 rule • Two-party model – providers (data providers) and consumers (service providers) • Simple HTTP encoding • XML schema for some degree of protocol conformance • Extensibility • Multiple item-level metadata • Collection level metadata 20020307

  14. Metadata harvesting The World According to OAI Service Providers Discovery Current Awareness Preservation Data Providers 20020307

  15. Content and Metadata Item (metadata) repository resource record 010010 20020307

  16. 20020307

  17. OAI-PMH History • Version 1.0 – January 21, 2001 • Version 1.1 – July 2, 2001 • W3C XML schema changes • Version 2.0a – March 1, 2002 • Production release – June 3, 2002 • No major functionality changes • Numerous functional tweaks • Harvesting granularity, flow control, error handling 20020307

  18. definitions & concepts repository record identifier datestamp set protocol features HTTP encoding metadata prefix & schema flow control protocol requests supporting requests harvesting requests Key Features of the OAI Metadata Harvesting Protocol 20020307

  19. supportdata repos i tory harves ter oai protocol items harvesting data repository 20020307

  20. protocol support format-specificmetadata community-specificrecord data record <record> <header> <identifier>oai:eg:001</identifier> <datestamp>1999-01-01</datestamp> </header> <metadata> <dc xmlns=“http://purl.org/dc”> <title>My Example</title> </dc> </metadata> <about> <ea xmlns=“http://www.arXiv.org/ea” <usage>No restrictions</usage> </ea> </about></record> 20020307

  21. Registered URI Scheme Unique ID within archive: (syntax is archive-specific) Archive Idendifier: Registered within OAI identifiers locally unique key for extracting a record from a repository oai-identifier = oai:archive-identifier:record-identifier example = oai:ncstrl:ncstrl.cornellcs/TR94-1418 20020307

  22. harvest withindate range repos i tory record record selective harvesting - datestamps 20020307

  23. S1 harvest within set repos i tory record record record selective harvesting - sets S2 20020307

  24. set specifics • repositories define hierarchical organization • each item in a repository may be organized in one set, several sets, or no sets at all • meaning of sets or of set hierarchy is not defined in protocol • individual communities may formulate common set configurations 20020307

  25. HTTP encoding - requests BASE-URL -----------> an.oa.org/OAI-scriptkeyword arguments --> verb=ListIdentifers&set=S1 GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1 POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1 20020307

  26. xml namespaces responseheader responsedata HTTP encoding - responses <xml version=1.0 encoding=“UTF-9” ?><GetRecord xmlns=“http://oai.namespace.uri” xmlns:xsi=“http://w3.namespace.uri” xsi:schemaLocation=“http://oai.namespace.uri http://oai.schemaURL”> <responseDate>2000-19-01T19:30:30-04:00</responseDate> <requestURL>http://an.oa.org/OAI-script?verb=GetRecord &amp;identifier=oai%3AarXiv%3A0001 &amp;metadataPrefix=oai_dc</requestURL> <record>record contents </recordadditional records</GetRecord> 20020307

  27. metadata prefix and schema • support for harvesting multiple metadata formats • metadata schema: each format must have a validating XML schema at a publicly accessible URL (communities may define shared formats and schema. • metadata prefix: each repository maps a prefix to the schema it supports, which is used in protocol requests. • support for unqualified Dublin Core mandatory • DC OAI record syntax that builds on base DCMI schema • reserved prefix oai_dc. 20020307

  28. protocol request harves ter repos i tory flow control 20020307

  29. flow control specifics • applies to all protocol requests that return lists: ListRecords, ListIdentifiers, ListSets • resumptionToken is opaque • semantics of partitioning of responses within resumption requests is undefined 20020307

  30. Extensibility Feature Summary • Multiple metadata formats • Collection level metadata • Identify “about” container • Record data • Terms and conditions • Provenance • Set structure • Pre-configured “queries” 20020307

  31. repos i tory harves ter OAI Protocol service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord 20020307

  32. repos i tory harves ter Supporting Protocol Requests service provider data provider Identify • Repository name • Base-URL • Admin e-mail • OAI protocol version • Description Container 20020307

  33. repos i tory harves ter Supporting Protocol Requests service provider data provider ListMetadataFormats • REPEAT • Format prefix • Format XML schema • /REPEAT 20020307

  34. repos i tory harves ter Supporting Protocol Requests service provider data provider ListSets • REPEAT • Set Specification • Set Name • /REPEAT 20020307

  35. repos i tory harves ter Harvesting Protocol Requests service provider data provider * from=a * until=b * set=klm ListRecords * metadataPrefix=oai_dc • REPEAT • Identifier • Datestamp • Metadata • About Container • /REPEAT 20020307

  36. repos i tory harves ter Harvesting Protocol Requests service provider data provider * from=a * until=b ListIdentifiers * set=klm • REPEAT • Identifier • Datestamp • /REPEAT 20020307

  37. repos i tory harves ter Harvesting Protocol Requests service provider data provider * identifier=oai:mlib:123a GetRecord * metadataPrefix=oai_dc • Identifier • Datestamp • Metadata • About 20020307

  38. 20020307

  39. 20020307

  40. Measures of Success • >100 implementers of the protocol • 64 registered • Basis for much research and implementation • JCDL 2002 • A subject category for paper submission! • Numerous papers building on OAI • Research Projects and Funding 20020307

  41. Externally funded initiatives • European Community • Open Archives Forum • Cyclades Project • Andrew W. Mellon Foundation • Funding for 7 service providers • Digital Library Federation • Gateways for access to member's digital collecitons • National Science Foundation • National Science Foundation Core Infrastructure 20020307

  42. DP9 Architecture • Giving search engines access to the “deep web” 20020307

  43. 20020307

  44. 20020307

  45. NSDL (National Digital Library for Science, Mathematics, and Engineering ) • Large-scale digital library technology • 1,000,000 users • 10,000,000 items • 100,000 collections • Diverse participants • Libraries • Academic/research institutions • Individuals 20020307

  46. NSDL References • http://comm.nsdlib.org/ • Zia, L.,Growing a National Learning Environments and Resources Network for Science, Mathematics, Engineering, and Technology Education, D-Lib, March 2001 • Arms, W. et. al., A Spectrum of Interoperability: The Site for Science Prototype for the NSDL, D-Lib, January 2002 • Lagoze, C. et. Al., Core Services in the Architecture of the NSDL, JCDL 2002, July 2002. 20020307

  47. The Challenge Provide coherent services for users across diverse collections, while retaining the individuality and richness of the collections. 20020307

  48. The strategy • A Spectrum of Interoperability • Open framework for collections & services • Embrace collections with rich metadata & support for standards, ... accommodate collections with limited metadata & limited support for interoperability. • Technical basis • Follow library tradition of metadata sharing • Use automated methods to generate, normalize, & translate metadata • Distribute metadata to service providers 20020307

  49. The Metadata Repository Services Users Metadata repository The metadata repository is a resource for service providers. It holds information about every collection and item known to the NSDL. Collections 20020307

  50. Metadata repository MR Ingest and Exposure OAI-PMH OAI-PMH Normalization Generation Cross-walking MR Front Porch OAI-PMH gathering Directentry 20020307

More Related