1 / 21

Open Archives Iniative – Protocol for Metadata Harvesting

Open Archives Iniative – Protocol for Metadata Harvesting. Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL. What is OAI?. Harvesting standard, documented at http://www.openarchives.org/OAI/openarchivesprotocol.html

orde
Download Presentation

Open Archives Iniative – Protocol for Metadata Harvesting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL

  2. What is OAI? • Harvesting standard, documented at http://www.openarchives.org/OAI/openarchivesprotocol.html • Seven service verbs • Identify • ListMetadataFormats • GetRecord • ListRecords • ListIdentifiers • ListSets • Allows multiple metadata formats • DC (Dublin core) format mandatory

  3. OAI “VERBS” Identify ListMetadataFormats GetRecord ListIdentifiers ListRecords ListSets How OAI works Service Provider Metadata Provider H A R VESTER REPOSITORY OAI HTTP Request OAI (OAI Verb) HTTP Response (Valid XML)

  4. Try it • Install Apache-Tomcat or any other Java servlet container • Download WAR file from http://fire.eun.org/Iztok/OAILREApp.war • Deploy WAR • Demo html http://localhost:8080/OAILREApp/ • Or type a service verb, e.g. http://localhost:8080/OAILREApp/oaiHandler?verb=Identify

  5. The raw XML • By default, the resulting XML has stylesheet attached for pretty rendering • To remove the stylesheet comment the line OAIHandler.styleSheet=testoai/oaicat.xsl in file oaicat.properties (in WAR file or the web-app dir)

  6. OAI XML example <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...> <responseDate>2007-06-11T06:48:58Z</responseDate> <request metadataPrefix="oai_lom" verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request> <ListRecords> <record> <header> <identifier>oai:oai.xyz-repository.com:exercises/112553</identifier> <datestamp>2007-06-09T22:38:28Z</datestamp> <setSpec>exercises</setSpec> </header> <metadata> <lom xmlns=...> ... </lom> </metadata> </record> .... <resumptionToken expirationDate="2007-06-11T07:48:58Z" completeListSize="42" cursor="10">1181544538265</resumptionToken> </ListRecords> </OAI-PMH>

  7. OAICat - a Java implementation • OAICat home at http://www.oclc.org/research/software/oai/cat.htm • Takes care of • web service details • OAI XML specification • The implementer has to provide three classes • RepositoryOAICatalog • RepositoryRecordFactory • Repository2oai_dc (lom, ...)- usually more than one

  8. A sample implementation (Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip) • Create a new web module • Add servlet oaiHandler to web.xml <servlet> <servlet-name>LreOAIHandler</servlet-name> <servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class> <load-on-startup>5</load-on-startup> </servlet> <servlet-mapping> <servlet-name>LreOAIHandler</servlet-name> <url-pattern>/oaiHandler</url-pattern> </servlet-mapping>

  9. (cont) • Define properties file location <context-param> <param-name>properties</param-name> <param-value>oaicat.properties</param-value> </context-param> • Welcome file for testing <welcome-file-list> <welcome-file>testoai/index.html</welcome-file> </welcome-file-list>

  10. Sample record • A record with basic fieldsid, url, title, descr and date • SampleOAICatalog contains an array with 3 sample records

  11. SampleOAICatalog.listIdentifiers • Parameters • from – date to harvest from (String in iso8601 format) • date or datetime - depends on granularity • to – date to harvest to • set – a set name, list only records from this set (if null, list all records) • set names classify objects in natural groups • every record may belong to multiple sets (or none) • metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)

  12. SampleOAICatalog.listIdentifiers • Must return a map with to fields • headers – a String iterator of OAI headers • identifiers – a String iterator of OAI identifiers • Both created by the call (rec is a SampleRecord) String[] header = getRecordFactory().createHeader(rec); headers.add(header[0]); identifiers.add(header[1]); • Create result Map<String, Object> listIdMap = new HashMap<String, Object>(); listIdMap.put("headers", headers.iterator()); listIdMap.put("identifiers", identifiers.iterator()); return listIdMap;

  13. getRecordFactory().createHeader(rec) • Creates header by calling the methods in SampleRecordFactory • String getOAIIdentifier(Object rec) • return full oai identifier “oai:oay.rep.com:id001” • String getDatestamp(Object rec) • returns date in iso8601 format • Iterator<String>getSetSpecs (Object rec) ArrayList<String> list = new ArrayList<String>(); list.add(...); return list.iterator(); • Iterator<String>getAbouts (Object rec) • String fromOAIIdentifier(String id) • helper method – convert id to a local id

  14. SampleOAICatalog.listSets • takes no parameters, returns the list of all sets in this repository • each ListIdentifiers or ListRecords query may contain a set name, limiting the results to just one set

  15. SampleOAICatalog.getSchemaLocations • like GetRecord, but returns the Vector of all metadata schema locations the record supports • to obtain them, just callgetRecordFactory().getSchemaLocations(rec);

  16. SampleOAICatalog.getRecord • String getRecord(String id, String metadataPrefix) • find record and convert it to xml string (<record> element) • id is in global format – to get local value call getRecordFactory().fromOAIIdentifier(id) • throw IdDoesNotExistException if record not found • to generate XML use constructRecordconstructRecord(rec, metadataPrefix)

  17. SampleOAICatalog.listRecords • just like ListIdentifiers, only generates a list of XML <record> elements • return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>(); listRecMap.put(“records", records.iterator());return listRecMap;

  18. Crosswalks • Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc • Only two methods per implementation • boolean isAvailableFor(Object rec) • String createMetadata(Object rec)SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM()); • throw CannotDisseminateFormatException if the metadata not available in this format

  19. SampleRecord.toLOM • uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/ • automatic serialization/deserialization of LOM and DC XML formats • Example lom.newGeneral().newIdentifier(0).newCatalog().setString("lre"); lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id); lom.newTechnical().newLocation(-1).setString(url); lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en"); lom.newGeneral().newTitle().newString(0).setString(title);

  20. Resumption • A repository usually has fixed limit on the numer of records to return in one call • if there are more available, it returns a resumption token, allowing to receive next packet • Implemented by functions listIdentifiers(String resumptionToken) , listRecords(String resumptionToken) • see XYZOAICatalog for details

  21. References • http://www.openarchives.org/OAI/openarchivesprotocol.html • http://www.fmf.uni-lj.si/~kavkler/ • http://www.oclc.org/research/software/oai/cat.htm • http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt • http://sourceforge.net/projects/lom-j/ • SIO/Trubar OAI urlhttp://sio.edus.si/LreTomcat/

More Related