100 likes | 218 Views
This workshop presentation by Michael L. Nelson and Terry L. Harrison explores the implementation of OAI-PMH for resource exchange and metadata harvesting. Highlighting various approaches, it discusses the potential of using OAI-PMH beyond its conventional scope, including methods for resource extraction, such as Base64 encoding and the creation of separate metadata formats. The presentation also delves into the implications of metadata changes, the challenges of scalability, and the importance of machine-readable instructions for complex objects, ultimately emphasizing the need for innovative strategies in digital resource preservation.
E N D
Using OAI-PMH for Resource Exchange OAI Metadata Harvesting Workshop, JCDL 03 Michael L. Nelson, Terry L. Harrison Old Dominion University Norfolk VA {mln,tharriso}@cs.odu.edu
OAI-PRH? • using OAI-PMH for resource extraction / exchange • yes, OAI-PMH is for metadata not resources, but its going to happen anyway… • mirroring • preservation (archive “zipping”) • convergence with OAIS • assumptions • a digital resource • rsync et al. neither appropriate nor possible • defer metadata vs. data discussion
Possible Approaches • Exploit knowledge outside the scope of the OAI-PMH to extract the resource • Base64 encode the resource and transmit via OAI-PMH as a separate “metadata” prefix? • Separate metadata prefix with instructions on how to extract / scrape the resource • Separate metadata format with XML encoded metadata, along with XSLT to decode it
Out of Band Knowledge take url in dc:identifier parse report number append “reportnumber.pdf” to url direct pdf
Out of Band Knowledge • pros: tailored, no “accidental” harvesting • cons: not scalable wrt # of repositories & harvesters, false negatives assumption: change in metadata means a change in data -- not always true!
Base64 Encoding • define separate metadata formats • base64:application/pdf • base64:application/powerpoint • pros: describable with OAI-PMH semantics, accomplished with standard OAI-PMH tools • cons: heavyweight (could use compression), suitable for simple objects only, accidental harvesting would produce high loads for repositories and harvesters
Metadata as Instructions cf. http://genomebiology.com/2003/4/6/R40
Metadata as Instructions • the resource described in <dc:identifier> could be a complex object • may not be appropriate to: • “tar” the object into a single file • expose all constituent objects through OAI-PMH • define a metadata prefix that provides machine readable instructions on how extract the complex object • METS?
XSLT • if the resource is already XML encoded, include an XSLT to transform into the desired format • use separate metadata formats or even sets for the harvester to express their transformation preferences? • pros: elegant, limited work for repository • cons: assumes client-side transformation capability, applicable only for XML-encodable resources