Metadata interoperability for everyone xml tools for catalogers
Download
1 / 30

- PowerPoint PPT Presentation


  • 378 Views
  • Uploaded on

Metadata interoperability for everyone – XML tools for catalogers Terry Reese Digital Production Unit Head Oregon State University Finding our way Metadata Interoperability Crosswalk systems Common problems Metadata tools Scripting Solutions MarcEdit MarcEdit and MODS

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - niveditha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Metadata interoperability for everyone xml tools for catalogers l.jpg

Metadata interoperability for everyone – XML tools for catalogers

Terry Reese

Digital Production Unit Head

Oregon State University


Finding our way l.jpg
Finding our way catalogers

  • Metadata Interoperability

    • Crosswalk systems

    • Common problems

  • Metadata tools

    • Scripting Solutions

    • MarcEdit

  • MarcEdit and MODS

    • Metadata transformations

    • MODS editing

    • Automatic MODS harvesting

  • Conclusion



Why metadata interoperability4 l.jpg
Why metadata interoperability? catalogers

  • Today, we have literally hundreds of different metadata schemas. In the library, we have a wide variety as well.

    • MARC (and all its flavors)

    • FGDC

    • Dublin Core

    • EAD

    • METS

    • MODS

    • Onyx

    • OAI

    • TEI

    • FRBR

    • GILS

    • etc…..


If you describe it l.jpg
If you describe it….. catalogers

  • Metadata schemas are created by communities to meet the special descriptive needs of those communities.

  • Of course, one of the dangers is competing standards within groups creating multiple incompatible schema or the creation of variations of a particular schema within a community.


If you describe it6 l.jpg
If you describe it….. catalogers

<controlaccess>

<subject source="lcsh" encodinganalog="650">College students--Iowa--Mount Vernon.</subject>

<subject encodinganalog="650" source="lcsh">Student activities--Iowa--Mount Vernon.</subject>

</controlaccess>

<controlaccess>

<subject source="lcsh">

<controlaccess encodinganalog=“650a”>College students</controlaccess>

<controlaccess encodinganalog=“650z”>Iowa</controlaccess>

<controlaccess encodinganalog=“650z”>Mount Vernon.</controlaccess>

</subject>

</controlaccess>


If you describe it7 l.jpg
If you describe it… catalogers

Some specialized examples:

  • MARC (MAchine Readable Communication)

    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/marc.txt

  • EAD (Encoded Archival Description)

    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/ead.xml

    • (MARC representation: http://oasis.orst.edu/record=b2324248)

  • Dublin Core

    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/dc.xml

  • FGDC

    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/fgdc.xml


If you describe it8 l.jpg
If you describe it… catalogers

Why would communities develop shared metadata schemas?

  • Shared schemas provide a structured method for sharing data within a community.

    • Example: MARC…its development paved the way for the current cooperative cataloging model and tools like:

      • OCLC

      • RLIN

      • Z39.50

  • But shared best practices?


Why use crosswalks l.jpg
Why use crosswalks? catalogers

Crosswalks:

  • Are developed by examining the similarities and differences between schemas.

  • Are one of the primary mechanism that can be used to allow different systems to interoperate with each other.

  • Breaks down data transfer barriers, allowing different systems to share data.


Why use crosswalks10 l.jpg
Why use crosswalks? catalogers

  • To combine metadata catalogs

    e.g. Union catalogs

  • To provide cross searchability between unlike datasets

    e.g. Federated search tools

  • To perform data/metadata maintenance

    e.g. Updating metadata formats – moving away from obsolete standards.

  • Repurposing one schema to another.


Why use crosswalks11 l.jpg
Why use crosswalks? catalogers

  • Cost

    • Metadata creation costs can be prohibitive

      • University of Indiana reported in 2003 on their digitization costs that 1/3 total cost attributed to metadata create.4 This was just the initial metadata creation costs and didn’t include estimates for ongoing metadata maintenance.

      • However, this isn’t just a digitization issue – its also an issue for traditional catalog workflows (books, serials, etc):

        • Loose OSU cost approximates (including OCLC charges):

          • Books (copy cataloging): $3 /book

          • Books (original): $27 /book

          • Thesis (subject/classification): $20 /thesis


Crosswalking challenges l.jpg
Crosswalking challenges catalogers

  • Schema granularity

    • One to many matches and many to one matches

    • Crosswalking from schemas with different granularity levels

      • Trying to map anything from unqualified Dublin Core.

    • Handling object relationships or hierarchies.

      • EAD=>MARC


Slide13 l.jpg

Crosswalking challenges catalogers

  • Dealing with spare parts

    • Since data crosswalking is rarely a one to one mapping, the process nearly always results in unmappable data.


Common crosswalking system designs l.jpg
Common Crosswalking System Designs catalogers

  • Type-broker model (Ockerbloom)

    • Facilitates crosswalking – allows users to query known systems

    • Provides analysis and facilitates unknown crosswalking systems:

      • Determines crosswalk path

      • Negotiates system nodes

      • Does negotiations without the need for a control data layer – but allows clients to specify a control data layer that must be utilized in the conversion process.


Common crosswalking system designs15 l.jpg
Common Crosswalking System Designs catalogers

  • Dumb-down crosswalking model

    • Converting data to its lowest common denominator.

      • Example: OAI’s initial use of Dublin Core as a tranfer format.


Metadata tools l.jpg
Metadata Tools catalogers

  • PERL-based:

    • MARC::RECORD, MARC::CharSet, MARC::XML

      • http://marcpm.sourceforge.net/

  • Non-PERL based:

    • MarcEdit – includes XML API and crosswalks for a number of common metadata schemas.

      • http://oregonstate.edu/~reeset/marcedit/html/

    • LC’s MARC tools: http://www.loc.gov/marc/marctools.html


Marcedit l.jpg
MarcEdit catalogers

  • MarcEdit 5.0

    • System Requirements:Using .NET FrameworkWindows 98, ME, NT, 2000, XP, 2003 .NET 1.1 FrameworkMDAC 2.7 runtimesUsing MONO Framework (hopefully available after August 2005).Windows 2000+, Linux and MAC OS XMONO system requirements


Marcedit crosswalking design l.jpg
MarcEdit: crosswalking design catalogers

  • Utilizes a modified version of Ockerbloom’s type-broker system.

  • Unlike Ockerbloom’s system, which broker’s transformations between known schemas, MarcEdit utilizes MARCXML as a control schema to facilitate translation.


Marcedit crosswalking design19 l.jpg
MarcEdit: crosswalking design catalogers

  • Ockerbloom model:broker system would continue doing translations till the desired format was found. Example: MODS, Dublin Core, MARCXML, MARC


Broker system model l.jpg
Broker System model catalogers

crosswalks

Type broker


Marcedit crosswalking design21 l.jpg
MarcEdit: crosswalking design catalogers

  • MarcEdit model:

    • So long as a schema has been mapped to MARCXML, any metadata combination could be utilized. This means that no more than two tranformations will ever take place. Example: MODS  MARCXML  EAD


Marcedit crosswalking design22 l.jpg
MarcEdit: crosswalking design catalogers

  • MarcEdit Crosswalk model

    • Pro

      • Crosswalks need not be directly related to each other

      • Requires crosswalker to know specific knowledge of only one schema

    • Con

      • each known crosswalk must be mapped to MARCXML.




Marcedit crosswalks for everyone25 l.jpg
MarcEdit: Crosswalks for everyone catalogers

  • Example Crosswalks:

    • MODS => MARC

    • MODS => FGDC

    • MODS => Dublin Core

    • EAD => MODS

    • EAD=>HTML


Marcedit crosswalks for everyone26 l.jpg
MarcEdit: Crosswalks for everyone catalogers

  • What’s MarcEdit doing?

    • Facilitates the crosswalk by:

      • Performing character translations (MARC8-UTF8)

      • Facilitates interaction between binary and XML formats.


Marcedit simplify editing mods records l.jpg
MarcEdit: Simplify Editing MODS records catalogers

  • New to MarcEdit 5.0 is the ability to edit MODS records in the MarcEditor as if it were a regular MARC file.

    • Allows catalogers unfamiliar with MODS to work with MODS data in a familiar form.

    • Will automatically translate new fields into MODS equivalents.

    • Will only translate MODS equivalent field data.


Marcedit simplify editing mods records28 l.jpg
MarcEdit: Simplify Editing MODS records catalogers

  • How it works:

    • MODS file is translated to MARCXML

    • MARCXML is translated to MarcEdit Mnemonic format.

    • Internally, the MarcEditor tracks format and changes.

    • On save, mnemonic file will be retranslated back into MODS with edited and added fields being translated to their appropriate MODS mappings.


Marcedit making oai simple l.jpg
MarcEdit: Making OAI Simple catalogers

  • New to MarcEdit 5.0 is a Metadata Harvester.

    • From within the MarcEditor, users can harvest DC, oai_marc or MODS records directly into MARC.

    • http://oregonstate.edu/~reeset/presentations/ala/summer2005/harvest.wmv


Bibliography l.jpg
Bibliography catalogers

  • Ockerbloom, John. Mediating among diverse data formats. School of Computer Science, Carnegie Mellon University. CMU-CS-98-102. January 1998. http://tom.library.upenn.edu/pubs/thesis/

  • Digitization Costs & Funding. Digital Library Workshop. Oct. 2003. http://www.dlib.indiana.edu/workshops/alioct03/costs.ppt