1 / 50

The Basics of OAI : An Introduction to the Protocol for Metadata Harvesting

The Basics of OAI : An Introduction to the Protocol for Metadata Harvesting. Timothy W. Cole and Sarah Shreeves University of Illinois at Urbana-Champaign Martin Halbert Emory University Pre-Conference Workshop Web-Wise 2004: Sharing Digital Resources Chicago, IL - March 3, 2004. Outline.

fuller
Download Presentation

The Basics of OAI : An Introduction to the Protocol for Metadata Harvesting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Basics of OAI : An Introduction to the Protocol for Metadata Harvesting Timothy W. Cole and Sarah Shreeves University of Illinois at Urbana-Champaign Martin Halbert Emory University Pre-Conference Workshop Web-Wise 2004: Sharing Digital Resources Chicago, IL - March 3, 2004

  2. Outline Introductions and Why We’re Here The Open Archives Initiative Protocol for Metadata Harvesting OAI-PMH Implementation Guidelines Metadata Authoring for OAI & Interoperability – Experiences from OAI Service Providers Web-Wise 2004

  3. Introductions • Presenters: • Tim Cole (t-cole3@uiuc.edu) • Sarah Shreeves (sshreeve@uiuc.edu)http://imlsdcc.grainger.uiuc.edu/ • Martin Halbert (mhalber@emory.edu)http://www.metascholar.org/ Web-Wise 2004

  4. Digital Collections vs. Digital Libraries • Building Good Digital Collections • The IMLS / NISO Framework (http://www.imls.gov/pubs/forumframework.htm)Focuses on Process of Creating Digital Content • Implicit Assumption: Digital Collections are the Raw Materials on which Digital Library Services are Built • Priority on Reusability, Persistence, Sustainability, Interoperability, Verification, and Documentation • OAI-PMH Enables Value-Added Digital Library Services which use Harvested Metadata Web-Wise 2004

  5. IMLS DCC Project Foundation • Implements Recommendations of the IMLS Digital Library Forum & Framework of Guidance for Building Good Digital Collections • Recommended Creation of IMLS NLG Collection Registry • Recommended Encouraging IMLS Projects to Author Metadata for Interoperability and Implement OAI-PMH • Increase access and visibility to IMLS funded digital collections • Build infrastructure for digital library out of many digital collections Web-Wise 2004

  6. IMLS Digital Collections and Content • Build a registry of all National Leadership Grant collections with digital content. • Assist and guide NLG projects in making item-level metadata sharable via the OAI Protocol for Metadata Harvesting. • Build a repository and search and discovery tools for integrated access to the content of NLG collections. • Research best practices for sharing metadata about diverse digital content and for supporting the interests of diverse user communities. Web-Wise 2004

  7. Motivation to Consider OAI-PMH • Access to / Sharing of Your Content • Visibility for Your Content • Opportunity to Participate in IMLS DCC Project • Opportunity to Gain Experience / Prepare for Future Projects Web-Wise 2004

  8. Who uses OAI? • Approximately 400 data providers • Basic building block of the National Science Digital Library (NSDL) • Incorporated into D-Space and Eprints.org • Part of ContentDM, Michigan’s DLXS, and other products • International use: Open Archives Forum in Europe, will be part of federation activities in the UK and EU Web-Wise 2004

  9. The Open Archives Initiative Protocol for Metadata Harvesting (www.openarchives.org) Web-Wise 2004

  10. OAI- PMH is a tool • The protocol refers to the set of rules that defines the communication between systems (like FTP and HTTP) • All about moving metadata (not data) around • Assumes widely distributed content, but centralized indexing & services • Build once, use for many applications – a building block for digital library services The purpose of OAI is to foster interoperability Web-Wise 2004

  11. OAI is not…. • Metadata • A search tool • A database Web-Wise 2004

  12. Brief History of OAI • Originated in the e-print archive community • Creation of interoperability tools for between archives of e-prints • Santa Fe Meetings - 1999 and 2000 • Paul Ginsparg, Rick Luce, & Herbert Von de Sompel initiators • OAI – PMH version history: • First Alpha Release, Sept. 2000 • 1.0 (Beta) Release January 2001 • 1.1 (Beta 2) Release July 2001 • 2.0 (Production) Release June 2002 Web-Wise 2004

  13. Some Basic OAI-PMH Concepts • “Federated search” rather than “Broadcast search” • Data providers – support OAI PMH as a means to expose metadata • Service providers – ‘harvests’ metadata from data providers via the OAI-PMH • OAI-PMH based upon HTTP and XML • OAI-PMH requires use of simple Dublin Core • BUT supports and encourages use of other metadata schemas Web-Wise 2004

  14. Federated vs. Distributed • Distributed/Broadcast searching: search and discovery over remote services and data • Federated/Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g. Union catalogs) Competing – but not incompatible – approaches to interoperability Web-Wise 2004

  15. As Compared to Z39.50 Web-Wise 2004

  16. Why Use OAI? • Content is widely distributed, in different kinds of non-Z39.50 enabled locations • Metadata provider more lightweight than Z39.50 and scales well • Service provider wishes to augment search services or metadata normalization is needed. Data Providers can use both Z39.50 & OAI Web-Wise 2004

  17. How OAI Works • 6 distinct ‘verbs’ or request • OAI requests are sent via HTTP • Responses are sent in valid XML Service Provider Data Provider DATABASE H A R VESTER HTTP Request (OAI Verb) REPOSITORY OAI OAI HTTP Response (Valid XML) Web-Wise 2004

  18. How OAI Works OAI “VERBS” Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord Web-Wise 2004

  19. Identify • Purpose • Return general information about the archive and its policies (e.g., datestamp granularity) • Parameters • None • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify Web-Wise 2004

  20. ListSets • Purpose • Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat) • Parameters • None • Sample URL: http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets Web-Wise 2004

  21. ListMetadataFormats • Purpose • List metadata formats supported by the archive as well as their schema locations and namespaces • Parameters • identifier – for a specific record (O) • Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats Web-Wise 2004

  22. ListIdentifiers • Purpose • List headers for all items corresponding to the specified parameters • Parameters • from – start date (O) • until – end date (O) • set – set to harvest from (O) • metadataPrefix – metadata format to list identifiers for (R) • resumptionToken – flow control mechanism (X) • Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc Web-Wise 2004

  23. GetRecord • Purpose • Returns the metadata for a single item in the form of an OAI record • Parameters • identifier – unique id for item (R) • metadataPrefix – metadata format for the record (R) • Sample URL • http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc Web-Wise 2004

  24. ListRecords • Purpose • Retrieves metadata records for multiple items • Parameters • from – start date (O) • until – end date (O) • set – set to harvest from (O) • resumptionToken – flow control mechanism (X) • metadataPrefix – metadata format (R) • Sample URL http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc Web-Wise 2004

  25. Unique Identifiers • Each OAI item must have a unique identifier • Identifiers must follow rules for valid URIs • Example: • oai:<archiveId>:<recordId> • oai:etd.vt.edu:etd-1234567890 • Each identifier must resolve to a single item and always to the same item • Can’t reuse OAI item identifiers Web-Wise 2004

  26. Datestamps • Needed for every OAI record to support incremental harvesting • Must be updated when addition or modification or deletion made in order to ensure changes are correctly propagated to harvesters • Different from dates within the metadata – OAI datestamp is used only for harvesting • Can be either YYYY-MM-DD or YYYY-MM-DDThh:mm:ssZ (must be GMT timezone) Web-Wise 2004

  27. OAI Items vs. OAI Records • An OAI ITEM is the complete set of metadata you possess describing an object in your repository • Items exist only in OAI Metadata Provider database • An OAI RECORD is an OAI Item disseminated in a particular metadata format – e.g., DC or MARC • Records are what get harvested by OAI Service Providers • OAI IDENTIFIERS are Item-Level • OAI DATESTAMPS are Record-Level Web-Wise 2004

  28. An OAI Record <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2002-02-28</datestamp> <setSpec>cs</setSpec> </header> <metadata> <oai_dc:dc xmlns…> <dc:title>Using Structural Metadata…</dc:title> … </oai_dc:dc> </metadata> <about> <provenance xmlns…> …. </provenance> </about> Web-Wise 2004

  29. Other Pieces of OAI • Flow Control • Sets • Multiple metadata schemas Web-Wise 2004

  30. Break – 15 minutes Web-Wise 2004

  31. Implementing OAI-PMH • Technical Approaches • Resources for OAI Metadata Providers • OAI Implementation Guidelines Web-Wise 2004

  32. Option 1 – Database Based System • Good option for collections • Actively adding metadata to their collection • With a large collection of metadata (over 5000 records) • Requirements: • Metadata • Database application (e.g. MySQL, Oracle, MS Access, MS SQL) • Web server with CGI capability (e.g. Apache/Tomcat, MS IIS) • Validating, transforming XML parser (e.g. Xerces, Sun’s JavaXMLPack, MSXML) Web-Wise 2004

  33. Option 2 – File Based System • Good option for collections • Actively adding metadata to their collection • With a large collection of metadata (over 5000 records) • Requirements • Metadata in XML or available for IMLS DCC to put into XML • Web server with CGI capability (e.g. Apache/Tomcat, MS IIS) • Validating, transforming XML parser (e.g. Xerces, Sun’s JavaXMLPack, MSXML) Web-Wise 2004

  34. Option 3 – Static Repository • Good option for collections: • No longer adding metadata to their collection • With small collections (fewer than 5000 records) • Requirements: • Metadata in XML. (IMLS DCC will help with conversions.) • Available space on a web server for posting static XML files Web-Wise 2004

  35. Open Source OAI Tools • Open Archives Initiative Tools • http://www.openarchives.org/tools/tools.html • University of Illinois OAI Tools • http://uilib-oai.sourceforge.net/ • OAI tools on Sourceforge • http://www.sourceforge.net and search for OAI in the Software/Groups category Web-Wise 2004

  36. Commercial and open source turnkey solutions • ContentDM • http://contentdm.com/ • Univ. of Michigan DLXS XPat • http://www.dlxs.org/ • D-Space • http://www.dspace.org/ • Endeavor Encompass (forthcoming) • http://encompass.endinfosys.com/ Web-Wise 2004

  37. Resources for data providers • OAI for beginners tutorial • http://www.oaforum.org/tutorial/ • Repository Explorer • http://purl.org/net/oai_explorer • XML Schema Validator • http://www.w3.org/2001/03/webdata/xsv • XML Tools at W3C • http://www.w3.org/XML/#software Web-Wise 2004

  38. Registering Your OAI Provider • Register with the Official OAI Registry http://www.openarchives.org/data/registerasprovider.html • The UIUC Experimental OAI Registry http://gita.grainger.uiuc.edu/registry/ • Test Before You Register • Registry Explorer @ Virginia Tech • Email us (sshreeve@uiuc.edu) for a Test Harvest Web-Wise 2004

  39. OAI Implementation Guidelines http://www.openarchives.org/OAI/2.0/guidelines.htm • Includes: • Guidelines for Repository Implementers • Guidelines for Harvester Implementers • Guidelines for Aggregators, Caches and Proxies • Specification for an OAI Static Repository… • Community-Specific Guidelines (OLAC, EPrints) Web-Wise 2004

  40. Metadata Authoring for OAI • Lessons Learned from Metascholar projects at Emory • Lessons learned from UIUC’s initial OAI harvesting project Web-Wise 2004

  41. UIUC – Lessons Learned Metadata aggregation challenges • Heterogeneous resources from multiple communities • Element usage practices • Granularity of description • Diverse vocabularies Web-Wise 2004

  42. UIUC – Lessons Learned Challenge: Heterogeneity of content & providers • Metadata describing digital and analog items – including images, texts, web pages, physical objects, finding aids, etc. • Knowledge structures – ontologies different • Perspectives on use and presentation of digital resources different Web-Wise 2004

  43. UIUC – Lessons Learned Challenge: Variations in use of Dublin Core Web-Wise 2004

  44. Description:Digital image of a single-sized cotton coverlet for a bed with embroidered butterfly design. Handmade by Anna F. Ginsberg Hayutin. Source:Materials: cotton and embroidery floss. Dimensions: 71 in. x 86 in. Markings: top right hand corner has 1 1/2 in. x 1/2 in. label cut outs at upper left and right hand side for head board; fabric is woven in a variation of a rib weave; color each of yellow and gray; hand-embroidered cotton butterflies and flowers from two shades of each color of embroidery floss - blue, pink, green and purple and single top 20 in. bordered with blue and black cotton embroidery thread; stitches used for embroidery: running stitch, chain stitch, French knot and back stitches; selvage edges left unfinished; lower edges turned under and finished with large gray running stitches made with embroidery floss. Format:Epson Expression 836 XL Scanner with Adobe Photoshop version 5.5; 300 dpi; 21-53K bytes. Available via the World Wide Web. Coverage:— Date Created: 2001-09-19 09:45:18; Updated: 20011107162451; Created: 2001-04-05; Created: 1912-1920? Type:Image UIUC – Lessons Learned Excerpt of Metadata Record Describing "Cotton coverlet with embroidered butterfly design" Web-Wise 2004

  45. UIUC – Lessons Learned Excerpt of Metadata Record Describing “American Woven Coverlet” Description:Materials: Textile--Multi, Pigment—Dye; Manufacturing Process: Weaving--Hand, Spinning, Dyeing, Hand-loomed blue wool and white linen coverlet, worked in overshot weave in plain geometric variant of a checkerboard pattern.Coverlet is constructed from finely spun, indigo-dyed wool and undyed linen, woven with considerable skill. Although the pattern is simpler, the overall craftsmanship is higher than 1934.01.0094A. - D. Schrishuhn, 11/19/99 This coverlet is an example of early "overshot" weaving construction, probably dating to the 1820's and is not attributable to any particular weaver. -- Georgette Meredith, 10/9/1973 Source:— Format:228 x 169 x 1.2 cm (1,629 g) Coverage:Euro-American; America, North; United States; Indiana? Illinois? Date:Early 19th c. CE Type:cultural; physical object; original Web-Wise 2004

  46. UIUC – Lessons Learned Challenge: Range of vocabularies in use Controlled Vocabularies in use for IMLS NLG projects (results from survey of 65 NLG projects with digital content) Web-Wise 2004

  47. Meeting the challenge – Data Providers Data providers can: • Create metadata for interoperability • Reusable metadata - Think beyond your local users and environment • Use well structured and defined schemas • Use and identify controlled vocabularies • Use Sets Web-Wise 2004

  48. Meeting the challenge – Service Providers Service providers can: • Analyze metadata and cluster and normalize some aspects • Build indexes based on type of resource (image, text, physical object) rather than collection • Custom interfaces and selective views for target audiences / domains Web-Wise 2004

  49. Recap OAI is a tool to facilitate interoperability OAI is easy - metadata is hard Better metadata = better interoperability Web-Wise 2004

  50. Contact Information Tim Cole PI, IMLS Digital Collections and Content University of Illinois Library at Urbana-Champaign Email: t-cole3@uiuc.edu Sarah Shreeves Project Coordinator, IMLS Digital Collections and Content University of Illinois Library at Urbana-Champaign Email: sshreeve@uiuc.edu Martin Halbert Director for Library Systems Emory University Email: mhalber@emory.edu Web-Wise 2004

More Related