1 / 21

A Digital Library Repository Utilizing the Open Archives Initiative

A Digital Library Repository Utilizing the Open Archives Initiative. Developed to meet the needs of UTK Library Special Collections. The Problem :. Tremendous quantities of valuable information exist in Museums, Libraries, and Research Centers

dreama
Download Presentation

A Digital Library Repository Utilizing the Open Archives Initiative

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Digital Library RepositoryUtilizing theOpen Archives Initiative Developed to meet the needs of UTK Library Special Collections

  2. The Problem: Tremendous quantities of valuable information exist in Museums, Libraries, and Research Centers which are not available in a standardized format via centralized search engines How to make the connection??? Musical scores and sound tracks Historical Documents Theses and Dissertations Scientific records Mathematical findings Photos and videos

  3. The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases

  4. Required For Translation: • Understanding of XML and XML schemas • Determining correct mapping of information to Unqualified Dublin Core Elements, in order to translate legacy files into a metadata format supported by the Open Archives Initiative • Scripts to reduce the labor of translation Musical scores and sound tracks Theses and Dissertations Historical Documents Scientific records Mathematical findings Photos and videos

  5. The 15 elements of Dublin Core Unqualified: Content: Title Description Coverage Relation Source Subject Type  Intellectual Property: Contributor Creator Publisher Rights Instantiation: Date Format Identifier Language A Common Language…. Dublin Core

  6. The XML schema constrains each element of the document, providing rules and framework for parsing: <complexType name="dublincoreType"> <choice minOccurs="0" maxOccurs="unbounded"> <element name="subject" minOccurs="0" maxOccurs="unbounded" type="string"/> </choice> </complexType> </schema> A Common Framework: XML schemas

  7. From a TEI Lite SGML file segment: <PROFILEDESC><TEXTCLASS><KEYWORDS> SCHEME="LCSH"><LIST> <ITEM>Letters</ITEM> <ITEM>CherokeeIndians—Claims against</ITEM> <ITEM>Tennessee</ITEM></LIST></KEYWORDS> </TEXTCLASS></PROFILEDESC></TEIHEADER> To an Unqualified Dublin Core XML file segment: <subject> Letters</subject> <subject>CherokeeIndians Claims against</subject> <subject>Tennessee</subject> A Common Format…. XML

  8. Selected Portions of a TEI-Lite SGML record <TEIHEADER> <FILEDESC> <TITLESTMT> <TITLE>[Letter] July 8, 1839, Washington City DC, [to] HP King, Qualla Town / William Holland Thomas: a machine-readable transcription of an image</TITLE>… <AUTHOR>Thomas, William Holland</AUTHOR> … <PUBLISHER>The University of Tennessee Libraries</PUBLISHER> <IDNO>wt025</IDNO>… <AVAILABILITY><P>This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for research, teaching, and personal use as long as this statement of availability is included in the text.</P></AVAILABILITY></PUBLICATIONSTMT> <SOURCEDESC><BIBL>… <DATE VALUE="1839-07-08">July 8, 1839</DATE>… <NOTE TYPE="summary">This document is a letter dated July 8, 1839 to H.P. King from William Holland Thomas with instructions for running the Indian Store. </NOTE> … <PROFILEDESC> <TEXTCLASS> KEYWORDS SCHEME="LCSH"><LIST> <ITEM>Cherokee Indians</ITEM> <ITEM>Government relations</ITEM> </LIST> /KEYWORDS></TEXTCLASS></PROFILEDESC>… <TEXT><BODY><DIV1 TYPE="letter">…

  9. … Translated to XML Unqualified Dublin Core <title>[Letter] July 8, 1839, Washington City DC, [to] HP King, QuallaTown</title> <contributor>The University of Tennessee Libraries, Knoxville</contributor> <contributor>Southeastern Native American Documents Collection (GALILEO (Georgia statewide project)) GAGAL</contributor> <creator>Thomas, William Holland</creator> <publisher>The University of Tennessee Libraries</publisher> <date>July 8, 1839</date> <description> This document is a letter dated July 8, 1839 toH.P. King from William Holland Thomas with instructions for running the Indian Store.</description> <identifier>Document ID: wt025</description> <identifier>http://www.helios.dii.utk.edu/oai/sgm/00178.html <subject>Cherokee Indians</subject> <subject>Government relations</subject> <rights> This work is the property of the Special Collections Library, University of Tennessee, Knoxville, TN. It may be used freely by individuals for research, teaching, and personal use as long as this statement of availability is included in the text. </rights> <type>letter</type> <type>computer file</type>

  10. Translation Tools: Crosswalks available: MARC to DC:http://www.loc.gov/marc/dccross.html Shown in action at: http://alcme.oclc.org/marc2dc/index.html OTHERS: http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html http://www.lub.lu.se/tk/metadata/MDin9612.html http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/index.html

  11. The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases

  12. Storage of OAI Records MySQL: small, fast, and free: http://www.mysql.com Use scripts to load database and retrieve information Store entire records, already marked up in Unqualified Dublin Core, for quick response; …or Store fields untagged, multiple values for a field separated by tags, and retag upon request: flexibility. This structure allows for a record to be entered once and retrieved in various formats upon request. For local search engines, also store hardcoded xml files in a directory. $sth = $dbh->prepare("select listit from $set where date <= '$until' and date >= '$from' order by id"); mysql> create table gsm( -> id char(10) not null, -> primary key (id), -> date char(10), -> path char (80), -> listit text);

  13. The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases

  14. Response: Offer up document descriptions via a standardized set of queries & responses: the Open Archives Initiative Protocol • 6 Verbs, with 5 required and/or optional arguments • 2) Unique Identifiers, Optional Sets, and Metadata Prefixes • 3) Flow control & Resumption Tokens • 4) Error Codes

  15. Verbs and arguments: The Open Archives Protocol • Identify • ListSets • ListMetadataFormats: optional: identifier • ListIdentifiers: required: metadata prefix (oai_dc); optional: from, until, set, resumption token • ListRecords: required: metadata prefix (oai_dc); optional: from, until, set, resumption token • GetRecord: required: identifier and metadata prefix

  16. Identifiers, Sets, and Metadata Prefixes Current Sets: Input as "Set": Sample Identifiers: har che civ etd emn ead gsm ldr rth tdh vid oai:tkn:har/har0001 oai:tkn:che/che0003oai:tkn:civ/civ0001 oai:tkn:etd/etd0002oai:tkn:emn/emn0001oai:tkn:ead/ead0003oai:tkn:gsm/gsm0045oai:tkn:ldr/ldr0002oai:tkn:rth/rth0034oai:tkn:tdh/tdh0005 oai:tkn:vid/vid0001 Bessie Harvey Collection Cherokee Civil War Collection Electronic Theses and Dissertations Emancipator Encoded Archival Description Great Smoky Mountains Library Development Review Roth Photography Collection Tennessee Documentary History Videos Supported Metadata prefix:     oai_dc

  17. Flow Control and ResumptionTokens For ListIdentifiers, ListSets and ListRecords <resumptionToken> LRrtdc20f19990202u20020101 </resumptionToken> LR or LI for ListRecord or ListIdentifier rt: Number or letter combination: which set next dc: Metadata format 20: Which record number to start with this time f19990202 = From date 1999-02-02 U20020101 = Until date 2002-01-01 Specifies the call to the database when this Resumption token is returned!!

  18. Error Codes: version 2.0 badResumptionToken badVerb badArgument idDoesNotExist cannotDisseminateFormat noMetadataFormats noRecordsMatch noSetHierarchy

  19. OAI 1.1 Test interface and Local Search Engine: http://oai.sunsite.utk.edu/1.1.html Search by: word or phrase Searching by all or any field and set, Sorting by date or set Returning: Lists of identifiers or short file descriptions, each with links to full file in HTML, XML, and online document Musical scores and sound tracks Historical Documents Theses and Dissertations Videos and Photos Scientific records Mathematical findings

  20. The Open Archives Solution: • Translation of records: Into a Common Format and Language: XML & Unqualified Dublin Core • Storage: of these translations • Response:to a standardized set of queries • Gather document descriptions from Repositories into large databases, using OAI Harvesters • Set up search engines to offer up information in these databases

  21. More Information: www.openarchives.org CrossWalks: http://www.sinica.edu.tw/~metadata/tool/mapping-foreign.html http://www.lub.lu.se/tk/metadata/MDin9612.html http://www.getty.edu/research/institute/standards/intrometadata/3_crosswalks/index.html Pre-developed repositories, harvesters, search engines, and more:  http://www.openarchives.org/tools/tools.html Current Service Providers, who can offer searches of your records from your repository responses; http://www.openarchives.org/service/listproviders.html

More Related