1 / 12

OAI Implementation Notes for LTRS, NACA and Open Video

OAI Implementation Notes for LTRS, NACA and Open Video. Michael L. Nelson NASA Langley Research Center & University of North Carolina mln@ils.unc.edu http://www.ils.unc.edu/~mln/ OAI Open Meeting, Washington DC, January 23, 2001. Collections Represented. NASA

barton
Download Presentation

OAI Implementation Notes for LTRS, NACA and Open Video

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OAI Implementation Notes for LTRS, NACA and Open Video Michael L. Nelson NASA Langley Research Center & University of North Carolina mln@ils.unc.edu http://www.ils.unc.edu/~mln/ OAI Open Meeting, Washington DC, January 23, 2001

  2. Collections Represented • NASA • LTRS (Langley Technical Report Server) • ~2300 reports, begun in 1992 • http://techreports.larc.nasa.gov/ltrs/ • OAI: http://techreports.larc.nasa.gov/ltrs/oai/ • NACA (National Advisory Committee for Aeronautics) • NACA was the predecessor organization to NASA, operating from 1917-1958 • ~6300 reports, begun in 1996 • http://naca.larc.nasa.gov/ • OAI: http://naca.larc.nasa.gov/oai/

  3. Collections Represented • University of North Carolina • The Open Video Project • ~ 200 public domain video segments, project begun in 1998 • http://www.open-video.org/ • OAI: http://buckets.dsi.internet2.edu/openvideo/oai/ • Open Video contents and OAI services still strictly experimental

  4. NASA: Why is OAI Important? • NASA builds DLs out of necessity, but ultimately NASA is a publisher • Interested in maximum exposure of and accessibility to its “unrestricted, unlimited” contents • In the NASA DLs, we left our “dark matter” partially exposed • individual reports were spidered by robots anyway… • OAI provides a more formal interface & protocol for exposing contents

  5. UNC: Why is OAI Important? • goal is to grow Open Video into a TREC-like corpus for video segments to share with the research community • a standard collection of short (10 seconds – 1 hour) video segments on which to perform video content based retrieval • variability in video types: color/b&w, sound/no sound, high/low motion, etc. • currently in MPEG-1 • others formats in the future

  6. OAI Implementation • Protocol only specifies CGI stub • many implementations possible • I used a “bucket” for each: LTRS, NACA & Open Video • buckets are aggregative, computational entities normally used for data storage • generally, 1 bucket per “report” • buckets = metadata + data + methods

  7. OAI Bucket Structure Bucket index.cgi _method.pkg _http.pkg _log.pkg _tc.pkg oai source files for methods http dependency files terms and conditions oai.pl element is a support library that defines access for the specific DL logs _md.pkg _state.pkg metadata bucket state bucket payload is DL specific support library default bucket packages in addition to the ~ 30 bucket methods each OAI verb is implemented as a separate method

  8. NACA OAI Implementation normal WWW use OAI requests NACA file system OAI responses built from examining structure of NACA filesystem OAI Server 1917 1918 . . . 1958 . . . . . . naca-tn-1 LTRS, NACA, Open Video have different file structures, metadata formats,etc. refer metadata thumbnail GIFs full size GIFs index.cgi

  9. Implementation • Did not implement sets • possible set candidates: • NACA: years, report type • LTRS: NASA STI subject classification • Only supporting Dublin Core • DC not sufficient for targeted applications • Did not implement resumptionToken

  10. if load > 0.05 redirect request http://blah/oai/?verb=ListIdentifiers OAI Server harvester HTTP Status Code 302 naca.larc.nasa.gov/oai/ http://blah/oai/?verb=ListIdentifiers <?xml version=“1.0” encoding=“UTF-8”?> … <ListIdentifiers> … </ListIdentifiers> OAI Server buckets.dsi.internet2.edu/naca/oai/ 302 Load Balancing • Interactive users on main DL machine should not be impacted by metadata harvesting • don’t take deliveries through the front door

  11. Metadata Quality • XML is very brittle – 1 bad character in the metadata and an entire ListIdentifiers mesg can be damaged • yes, my DLs should be more diligent about scrubbing their metadata, but… • author contributed metadata particularly a problem (e.g. control characters from copy-n-paste) • one advantage of resumptionToken is that it compartmentalizes bad data

  12. OAI Impact • Can use OAI to build our own generalized services • updates, alerts • Finally have a clean method to export metadata, both to: • the general community for unrestricted data • closed communities with restricted data • Los Alamos, Air Force Research Laboratory, NASA

More Related