metadata interoperability for everyone xml tools for catalogers
Download
Skip this Video
Download Presentation
Metadata interoperability for everyone – XML tools for catalogers

Loading in 2 Seconds...

play fullscreen
1 / 30

Reese handout - PowerPoint PPT Presentation


  • 377 Views
  • Uploaded on

Metadata interoperability for everyone – XML tools for catalogers Terry Reese Digital Production Unit Head Oregon State University Finding our way Metadata Interoperability Crosswalk systems Common problems Metadata tools Scripting Solutions MarcEdit MarcEdit and MODS

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Reese handout' - niveditha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
metadata interoperability for everyone xml tools for catalogers

Metadata interoperability for everyone – XML tools for catalogers

Terry Reese

Digital Production Unit Head

Oregon State University

finding our way
Finding our way
  • Metadata Interoperability
    • Crosswalk systems
    • Common problems
  • Metadata tools
    • Scripting Solutions
    • MarcEdit
  • MarcEdit and MODS
    • Metadata transformations
    • MODS editing
    • Automatic MODS harvesting
  • Conclusion
why metadata interoperability4
Why metadata interoperability?
  • Today, we have literally hundreds of different metadata schemas. In the library, we have a wide variety as well.
    • MARC (and all its flavors)
    • FGDC
    • Dublin Core
    • EAD
    • METS
    • MODS
    • Onyx
    • OAI
    • TEI
    • FRBR
    • GILS
    • etc…..
if you describe it
If you describe it…..
  • Metadata schemas are created by communities to meet the special descriptive needs of those communities.
  • Of course, one of the dangers is competing standards within groups creating multiple incompatible schema or the creation of variations of a particular schema within a community.
if you describe it6
If you describe it…..

<controlaccess>

<subject source="lcsh" encodinganalog="650">College students--Iowa--Mount Vernon.</subject>

<subject encodinganalog="650" source="lcsh">Student activities--Iowa--Mount Vernon.</subject>

</controlaccess>

<controlaccess>

<subject source="lcsh">

<controlaccess encodinganalog=“650a”>College students</controlaccess>

<controlaccess encodinganalog=“650z”>Iowa</controlaccess>

<controlaccess encodinganalog=“650z”>Mount Vernon.</controlaccess>

</subject>

</controlaccess>

if you describe it7
If you describe it…

Some specialized examples:

  • MARC (MAchine Readable Communication)
    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/marc.txt
  • EAD (Encoded Archival Description)
    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/ead.xml
    • (MARC representation: http://oasis.orst.edu/record=b2324248)
  • Dublin Core
    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/dc.xml
  • FGDC
    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/fgdc.xml
if you describe it8
If you describe it…

Why would communities develop shared metadata schemas?

  • Shared schemas provide a structured method for sharing data within a community.
    • Example: MARC…its development paved the way for the current cooperative cataloging model and tools like:
      • OCLC
      • RLIN
      • Z39.50
  • But shared best practices?
why use crosswalks
Why use crosswalks?

Crosswalks:

  • Are developed by examining the similarities and differences between schemas.
  • Are one of the primary mechanism that can be used to allow different systems to interoperate with each other.
  • Breaks down data transfer barriers, allowing different systems to share data.
why use crosswalks10
Why use crosswalks?
  • To combine metadata catalogs

e.g. Union catalogs

  • To provide cross searchability between unlike datasets

e.g. Federated search tools

  • To perform data/metadata maintenance

e.g. Updating metadata formats – moving away from obsolete standards.

  • Repurposing one schema to another.
why use crosswalks11
Why use crosswalks?
  • Cost
    • Metadata creation costs can be prohibitive
      • University of Indiana reported in 2003 on their digitization costs that 1/3 total cost attributed to metadata create.4 This was just the initial metadata creation costs and didn’t include estimates for ongoing metadata maintenance.
      • However, this isn’t just a digitization issue – its also an issue for traditional catalog workflows (books, serials, etc):
        • Loose OSU cost approximates (including OCLC charges):
          • Books (copy cataloging): $3 /book
          • Books (original): $27 /book
          • Thesis (subject/classification): $20 /thesis
crosswalking challenges
Crosswalking challenges
  • Schema granularity
    • One to many matches and many to one matches
    • Crosswalking from schemas with different granularity levels
      • Trying to map anything from unqualified Dublin Core.
    • Handling object relationships or hierarchies.
      • EAD=>MARC
slide13

Crosswalking challenges

  • Dealing with spare parts
    • Since data crosswalking is rarely a one to one mapping, the process nearly always results in unmappable data.
common crosswalking system designs
Common Crosswalking System Designs
  • Type-broker model (Ockerbloom)
    • Facilitates crosswalking – allows users to query known systems
    • Provides analysis and facilitates unknown crosswalking systems:
      • Determines crosswalk path
      • Negotiates system nodes
      • Does negotiations without the need for a control data layer – but allows clients to specify a control data layer that must be utilized in the conversion process.
common crosswalking system designs15
Common Crosswalking System Designs
  • Dumb-down crosswalking model
    • Converting data to its lowest common denominator.
      • Example: OAI’s initial use of Dublin Core as a tranfer format.
metadata tools
Metadata Tools
  • PERL-based:
    • MARC::RECORD, MARC::CharSet, MARC::XML
      • http://marcpm.sourceforge.net/
  • Non-PERL based:
    • MarcEdit – includes XML API and crosswalks for a number of common metadata schemas.
      • http://oregonstate.edu/~reeset/marcedit/html/
    • LC’s MARC tools: http://www.loc.gov/marc/marctools.html
marcedit
MarcEdit
  • MarcEdit 5.0
    • System Requirements:Using .NET FrameworkWindows 98, ME, NT, 2000, XP, 2003 .NET 1.1 FrameworkMDAC 2.7 runtimesUsing MONO Framework (hopefully available after August 2005).Windows 2000+, Linux and MAC OS XMONO system requirements
marcedit crosswalking design
MarcEdit: crosswalking design
  • Utilizes a modified version of Ockerbloom’s type-broker system.
  • Unlike Ockerbloom’s system, which broker’s transformations between known schemas, MarcEdit utilizes MARCXML as a control schema to facilitate translation.
marcedit crosswalking design19
MarcEdit: crosswalking design
  • Ockerbloom model:broker system would continue doing translations till the desired format was found. Example: MODS, Dublin Core, MARCXML, MARC
broker system model
Broker System model

crosswalks

Type broker

marcedit crosswalking design21
MarcEdit: crosswalking design
  • MarcEdit model:
    • So long as a schema has been mapped to MARCXML, any metadata combination could be utilized. This means that no more than two tranformations will ever take place. Example: MODS  MARCXML  EAD
marcedit crosswalking design22
MarcEdit: crosswalking design
  • MarcEdit Crosswalk model
    • Pro
      • Crosswalks need not be directly related to each other
      • Requires crosswalker to know specific knowledge of only one schema
    • Con
      • each known crosswalk must be mapped to MARCXML.
marcedit crosswalks for everyone25
MarcEdit: Crosswalks for everyone
  • Example Crosswalks:
    • MODS => MARC
    • MODS => FGDC
    • MODS => Dublin Core
    • EAD => MODS
    • EAD=>HTML
marcedit crosswalks for everyone26
MarcEdit: Crosswalks for everyone
  • What’s MarcEdit doing?
    • Facilitates the crosswalk by:
      • Performing character translations (MARC8-UTF8)
      • Facilitates interaction between binary and XML formats.
marcedit simplify editing mods records
MarcEdit: Simplify Editing MODS records
  • New to MarcEdit 5.0 is the ability to edit MODS records in the MarcEditor as if it were a regular MARC file.
    • Allows catalogers unfamiliar with MODS to work with MODS data in a familiar form.
    • Will automatically translate new fields into MODS equivalents.
    • Will only translate MODS equivalent field data.
marcedit simplify editing mods records28
MarcEdit: Simplify Editing MODS records
  • How it works:
    • MODS file is translated to MARCXML
    • MARCXML is translated to MarcEdit Mnemonic format.
    • Internally, the MarcEditor tracks format and changes.
    • On save, mnemonic file will be retranslated back into MODS with edited and added fields being translated to their appropriate MODS mappings.
marcedit making oai simple
MarcEdit: Making OAI Simple
  • New to MarcEdit 5.0 is a Metadata Harvester.
    • From within the MarcEditor, users can harvest DC, oai_marc or MODS records directly into MARC.
    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/harvest.wmv
bibliography
Bibliography
  • Ockerbloom, John. Mediating among diverse data formats. School of Computer Science, Carnegie Mellon University. CMU-CS-98-102. January 1998. http://tom.library.upenn.edu/pubs/thesis/
  • Digitization Costs & Funding. Digital Library Workshop. Oct. 2003. http://www.dlib.indiana.edu/workshops/alioct03/costs.ppt
ad