1 / 38

OAIster: A “No Dead Ends” Digital Object Service

OAIster: A “No Dead Ends” Digital Object Service. Kat Hagedorn OAIster Librarian University of Michigan Libraries October 3, 2003. background. One-year Mellon grant project to test the feasibility of making OAI-enabled metadata for digital objects accessible to the public

nbell
Download Presentation

OAIster: A “No Dead Ends” Digital Object Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OAIster: A “No Dead Ends” Digital Object Service Kat Hagedorn OAIster Librarian University of Michigan Libraries October 3, 2003

  2. background • One-year Mellon grant project to test the feasibility of making OAI-enabled metadata for digital objects accessible to the public • Digital Library Production Service at University of Michigan Libraries began work in December 2001 • Publicized as OAIster in February 2002 • Launched in June 2002

  3. highlights • Any audience • Any subject matter • Any format • Freely accessible • No dead ends • One-stop shopping …retrieving the “hidden web”

  4. the protocol • OAI = Open Archives Initiative • OAI-PMH = Open Archives Initiative Protocol for Metadata Harvesting • Designed to make it easy to exchange metadata among interested parties • Consists of 6 HTTP requests to identify repositories / metadata and perform “harvesting”

  5. tool we borrowed • University of Illinois Urbana-Champaign open-source OAI protocol harvester • java edition for our unix environment • Worked collaboratively to iron out kinks • resumptionToken / retryAfter • inexplicable kill • bogus records in MySQL table

  6. development environment • Digital Library Extension Service (DLXS) • Develop open-source middleware and license XPAT search engine for building and mounting digital libraries • Middleware consists of document classes, i.e., Text, Image, Bib, FindAid • Originally designed to make SGML encoded texts available online

  7. tool we developed • Runs in DLXS environment using BibClass • Current BibClass web templates modified • Additional java-based transformation tool to: • DC metadata records concatenated • No-digital-object records filtered out • Records counted • Conversion from UTF-8 to ISO-8859-1 • XSLT used to transform DC records into BibClass records

  8. system design XSL stylesheets (per source type) UIUC harvester XSLT transformation tool OAI-enabled DC records Record storage Non-OAI-enabled DC records Search interface (XPAT) BibClass indexes

  9. result • One place to look for digital objects • Big • 1,484,767 metadata records • 195 institutions (as of August 03) • Popular • Averages 3300 search sessions / month • Picked up in March 03: average 3700 now • 43,894 searches total (through July 03)

  10. www.oaister.org: search

  11. www.oaister.org: limiters

  12. www.oaister.org: sort

  13. www.oaister.org: results

  14. www.oaister.org: repositories

  15. repositories: e.g., • Online Archive of California: manuscripts, photographs, and works of art held in institutions across California • arXiv Eprint Archive: math and physics pre- and post-prints • Sammelpunkt, Elektronisch Archivierte Theorie: archive of philosophical publications • British Women Romantic Poets Project: collection of poems written by British women between 1789 and 1832

  16. repositories: stats • As of July 03, out of 191 repositories… • U.S. and foreign • U.S.: 49% (94) • Foreign: 51% (97) • By subject • Humanities: 26% (50) • Science: 30% (58) • Mixed: 43% (83) • E-prints and pre-prints • Using eprints.org software: 41% (78) • Not using eprints.org software: 58% (110)

  17. major issues encountered • Metadata variation • Records not leading to digital objects • Access restrictions on digital objects described in records • Duplicate records for a single digital object

  18. issue: metadata variation • With more records, users need more restrictions • Consistent metadata needed to facilitate these restrictions • One option: normalization of data

  19. issue: metadata variation • Type: the obvious quick win • 240 metadata values mapped to four generic values (text, image, audio, video) • e.g., audio, sound = audio motion, animation, newsreels, etc. = video watercolour, watercolor, slides, etc. = image article, articles, booklet, diss, story, etc. = text

  20. issue: metadata variation • Date: where to begin? • Most records with at least one date • Some records include up to seven dates • No consistent style of date • Subject: out of context, what meaning? • Many records with at least one subject element • But over 100 records with more than 50 subjects • And one record with 1000!

  21. issue: metadata variation • Sample date values <date>2-12-01</date> <date>2002-01-01</date> <date>0000-00-00</date> <date>1822</date> <date>between 1827 and 1833</date> <date>18--?</date> <date>November 13, 1947</date> <date>SEP 1958</date> <date>235 bce</date> <date>Summer, 1948</date>

  22. issue: metadata variation • Sample subject values <subject>30,51,52</subject> <subject>1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson].</subject> <subject>Slavery--United States--Controversial literature</subject> <subject>view of interior with John Henry sculpture</subject> <subject>Particles (Nuclear physics) -- Research.</subject>

  23. issue: no digital objects • Some records contain links to further description of digital object • But not the digital object itself • Culling difficult • One option: add explanatory text to site

  24. issue: access restrictions • No records where metadata itself is restricted in use (as far as we know!) • Definitely some records where objects are restricted to licensed users • One option: add explanatory text to site

  25. issue: access restrictions • DC Rights element: often not enough info about viewing restrictions • Currently no protocol method for indicating restricted digital objects (i.e., “yes/no” toggle element) • Need to assess whether users feel informed or frustrated when encountering restricted objects

  26. issue: duplicate records • Two records harvested, different identifiers, same object described and pointed to • Acquired in two ways: • Harvesting of original repository and aggregator • Receiving “static” DC records provided by content creator and harvesting aggregator

  27. issue: duplicate records • Aggregators can contain records not currently available through OAI channels • Aggregators do not always contain all the records of a particular original repository • So, need to harvest both aggregator and original repositories

  28. issue: duplicate records • Harvest records from aggregator • Also receive from original content creator, but as snapshot • e.g., MEO and cogprints • Snapshot before aggregator • Creator unsure all records would be aggregated

  29. issue: duplicate records • Were duplicates to be identified, how to deal with the issue? • Suppress? • Group? • Flag? • So far, not addressed in OAIster

  30. assessment • Large survey (over 400 respondents) • 2 rounds of face-to-face and remote user testing • Conducted before design and after phase one rollout

  31. assessment: survey • Online journals and reference materials wanted over other digital objects • Difficult to search for information; every service different; where to start • Number of respondents (5%) indicated they were generally successful in finding resources online

  32. assessment: user testing • No short and long record formats: one size fits all • Want clearly defined and labeled AND/OR searching options • Results clear and easy to understand • Want to sort by title, date, institution, resource format…you name it! • Use OAIster for academic, trustworthy, authentic materials

  33. Focus on high usability Focus on all content available Some service providers have increased functionality (e.g., de-duplication, integration of thesauri) service providers: comparison high UIUC, Emory, etc. OAIster Usability Ad hoc DP-9 low some all Content

  34. future of OAIster • Make it faster • Advanced searching • Grouping to aid browsing • Saving/emailing/downloading records • Further normalization of data • Handling duplicate records • Collaboration with other services: search, instructional…

  35. current state of protocol • Popular • As Peter Suber says: • “…no other single idea or technology in the [open-source movement has enjoyed this density of endorsement and adoption in a six month period.” • Data providers over one year: • June 02: 56 repositories / 274,062 records • June 03: 187 repositories / 1,246,953 records • Over three-fold increase for repositories • Over four-fold increase for records

  36. future of protocol • Branching out • HTTP vs. SOAP • DC required vs. highly recommended • Use of OAI in closed environments • Static repository protocol • Need for add-on applications • OAI evangelism

  37. what can you do? • OAI-enable your data • DLXS customer: easiest • Make sure data is UTF-8 / Unicode compliant • Provide as much metadata as you can • Use standard element tags • Develop “sets” for service providers • Let us know you’re ready to be harvested • Keep us informed about changes to the harvesting URL, new data and deleted data, change in contact info

  38. contact info • Kat Hagedorn • University of Michigan Libraries, Digital Library Production Service • khage@umich.edu • http://www.oaister.org/

More Related