Metadata Encoding & Crosswalks - PowerPoint PPT Presentation

metadata encoding crosswalks n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Metadata Encoding & Crosswalks PowerPoint Presentation
Download Presentation
Metadata Encoding & Crosswalks

play fullscreen
1 / 50
Metadata Encoding & Crosswalks
110 Views
Download Presentation
liluye
Download Presentation

Metadata Encoding & Crosswalks

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Metadata Encoding & Crosswalks Spring 2006, 20 February Bharat Mehra IS 520 (Organization and Representation of Information) School of Information Sciences University of Tennessee

  2. Metadata: Vocabulary (Recap) Metadata schemes (framework): Definitions What information is needed to describe something What sets of metadata elements/fields are to be designed for a specific purpose, such as describing a particular type of information resource Fielded record of summary information about a “document”

  3. Metadata: Vocabulary (Recap) Semantics of scheme: definition or meaning of the elements themselves Content: values given to metadata elements Syntax rules: how the elements and their content should be encoded Metadata encoding: From cataloging record to MARC record Metadata crosswalks: Allows creation of web pages where content from any one framework can be encoded (formatted) in different formats

  4. Cataloging Procedures for One Bibliographic Record • Descriptive cataloging • bibliographic descriptions and determination of access points (AACR2) • Main and added entries • Subject access • Subject cataloging: selection of subject access points (LCSH/SLSH/MSH) • Classification: assignment of class numbers and book numbers (LCC/DDC/NLMC) • Authority Control (authority records, file, system) • MARC Tagging

  5. Metadata: Vocabulary (Recap) • Mark-up language: Metadata encoded and actually embedded within a document using a specific syntactic (and semantic) scheme • Metadata Schemes: Dublin Core MODS VRA • Encoding Formats: HTML XML RTF • Metadata Encoding • A syntactic scheme for writing down some metadata content • A specification of the kinds of info that should be presented to describe an info object • Bibliographic Control Card Catalogs MARC • (Metadata Manifestations)

  6. Metadata: Vocabulary (Recap) Encoding in definable syntax SGML (Standard Generalized Mark-up Language) Superset that allows for richest mark-up of a document HTML (Hypertext Mark-up Language) Web page description language. A W3C standard, compatible with all systems XML (Extensible Mark-up Language) XML is an extended form of HTML that allows for locally defined tag sets and the easy exchange of structured information

  7. Observable IR Trends • from post-publication metadata to embedded metadata (synchronous) • from metadata by indexer/cataloger to by author • from controlled vocabulary vs. natural language (key words) • from centralized classic IR systems to distributed resources

  8. Metadata by Authors • born with the document • as a part of the encoded electronic documents (not separated) • provide description and access points • IR databases can import the metadata automatically

  9. the way we work withand think aboutmetadata is changing

  10. why?because todaymetadata exists in

  11. distributed

  12. heterogeneous

  13. networked

  14. electronic

  15. informationenvironments

  16. distributedheterogeneousnetworkedinformationenvironments

  17. disaggregation of information is an outcome that allows data manipulation at a deeper level

  18. Bibliographic Processes • Identify basic cataloging data in terms of: • (Chan’s framework) • Apply metadata schemes (Dublin Core, MODS, VRA) to card catalog records (or MARC records) • Encode the above in different encoding formats • (HTML, XML, RTF) • Card Catalogs    MARC

  19. What is encoding? • A standard used to mark up data so that the data can be understood by human beings or computers • A set of tags precisely defines the elements: • semantics • syntaxes • Tags can be numeric or language-based • Application software can process the structured data for specified use

  20. The Dublin Core Elements Instantiation Date Type* Format * Identifier Language Intellectual property Creator Contributor Publisher Rights • Content • Title • Subject • Description • Source • Coverage • Relation

  21. Sample Metadata Encoding: Dublin Core in HTML <META NAME=“DC.Title” Content=“IS 520 Syllabus Page”> <META NAME=“DC.Creator” Content=“Bharat Mehra”> <META NAME=“DC.Creator.Address” Content=bmehra@utk.edu> <META NAME=“DC.Subject” Content=“Information Organization and Representation”> Sample Metadata Encoding: Dublin Core in XML <?xml version=“1.0”?> <dc:title>IS 520 Syllabus Page</dc:title> <dc:creator>Bharat Mehra</dc:creator> <dc:subject>Information Organization and Representation</dc:subject> <dc:subject>IS 520</dc:subject> <dc:date>2005-09-13</dc:date> <dc:format>text/html</dc:format> <dc:language>en</dc:language> Comparison: same content but different encoding HTML: <META NAME=“DC.Creator” Content=“Bharat Mehra”> XML: <dc:creator>Bharat Mehra</dc:creator>

  22. Dublin Core (DC): Features • All fields are repeatable • All fields are optional • All fields are extensible by adding qualifiers • Qualifiers serve the functions as • 1. Refinement (narrower; specific) • 2. Scheme (standardization)

  23. Qualifier by Refinement • <META NAME="DC.TITLE" CONTENT="Making sense of college grades"> • <META NAME="DC.TITLE.SUBTITLE" CONTENT="Why the grading system does not work and what can be done about it">

  24. Qualifier by Scheme • <META NAME="DC.IDENTIFIER" SCHEME="ISBN"CONTENT="0875896871"> • <META NAME="DC.SUBJECT" SCHEME="LCSH"CONTENT="Grading and marking (Students)"> • <META NAME="DC.SUBJECT" SCHEME=“LCC"CONTENT="LB2368"> • [alternatively LCCS] • International Standard Book Number: a unique ten digit number assigned to every printed book • Library of Congress Classification System: A system for classifying and arranging books in libraries adopted by most of the nations libraries and universities

  25. Qualifiers by both Refinement & Scheme • META NAME="DC.CREATOR.PERSONAL" SCHEME="LCNAF"CONTENT="Eison, James A., 1950-"> • [alternatively LCNH] • <META NAME="DC.RELATION.HasFormat" SCHEME="PURL"CONTENT="http://purl.access.gpo.gov/GPO/LPS2878"> • Library of Congress Name Authority File. A comprehensive controlled vocabulary (established list of preferred terms, often with cross references), primarily of names and jurisdictions, used by thousands of institutions to describe and index persons or bodies who are the subject, or are responsible for the intellectual content of, library and archival material. • Persistent Uniform Resource Locator

  26. DC.RELATION • IsVersionOf -- revision or different expression • HasVersion -- revision or different expression • HasFormat -- different manifestation • IsFormatOf -- different manifestation • IsReplacedby -- e.g., journal title is changed to [new] • Replaces -- e.g., new journal title replaces [old] • IsPartof -- work in a series or a serial or a larger work • HasPart -- works/parts in a series/serial/work • IsReferencedby -- cited by another work(s) • References -- works cited in the work

  27. Educational Resource Information Center, a federally-funded information clearinghouse which publishes an index with abstracts of journal articles and unpublished research reports in education and related fields DC.SUBJECT • Keywords approach • author assigned uncontrolled terms • Controlvocabulary approach • terms from LCSH, MeSH, ERIC, etc. • Classificationapproach • classes from DDC, LCC, UDC, etc.

  28. Bibliographic Processes Card Catalog  MARC 21 (ISO 2709) • MAchine-Readable Cataloging -- numeric based tags • bibliographic format • description of item; main entry/added entries; subject headings; classification or call number • authority format (name and subject headings) • holdings format • community information format • classification data format

  29. Some Rules 0XX (control information, numbers, codes) 1XXfields (main entry) 2XX (titles, edition, imprint: in general, the title statement of responsibility, edition, and publication information) 3XX (physical description etc.) 4XXfields (series statements) 5XX (notes) 6XXfields (subject added entries) 7XXfields (added entries other than subject or series) 8XXfields (series added entries: other authoritative forms) 9XXfields (reserved for local use: used by vendors, systems, or individuals to exchange additional data) X00Personal names X10Corporate names X11Meeting names X30Uniform titles X40Bibliographic titles X50Topical terms X51Geographic names For example, 610: subject heading that is a corporate name

  30. Some More Rules • Title and Statement of Responsibility • 245 Title proper $h [general material designation] : $b other title information/ $c statement of responsibility ; subsequent statement of responsibility • Edition • Edition statement / $b statement of responsibility relating to the edition • Publication, Distribution. Etc. • Place of publication : $b Name of publisher, $c Date of publication • Series • 440 - Series title, $x ; $v • 490 0 Series title [not used as an added entry] • 490 1 Series title as found in the item [not to be used for added entry] • 830 - Authorized form of series title as added entry

  31. Raccoons and ripe corn 245 10 $a first indicator: there should be a separate title entry • 520## Undefined “The” Emperor’s New Clothes Non-filing characters Delimiter: separator of subfields 1XX The 100s tag indicator

  32. Main Entry Card / Shelflist Card LB2368 .M57 1986 Milton, Ohmer. Making sense of college grades / Ohmer Milton, Howard R. Pollio, James A. Eison ; foreword by Laura Bornholdt. -- 1st ed. -- San Francisco : Jossey-Bass, 1986. xxii, 287 p. : ill. ; 24 cm. -- (Jossey-Bass higher education series) Half-title: Making sense of college grades : why the grading system does not work and what can be done about it. Includes bibliographical references (p. 271-280) and index. 0-875-89687-1 1. Grading and marking (Students). 2. College credits. I. Pollio, Howard R. II. Eison, James A., 1950- III. Title. IV. Series.

  33. MARC Bibliographic Record Coded directly from the previous card record • 020 0875896871 ISBN • 050 00 LB2368 ‡b .M57 1986 • 100 1_ Milton, Ohmer. • 245 10 Making sense of college grades / ‡c Ohmer Milton, Howard R. Pollio, James A. Eison ; foreword by Laura Bornholdt. • 250 1st ed. • 260 San Francisco : ‡b Jossey-Bass, ‡c 1986. • 300 xxii, 287 p. : ‡b ill. ; ‡c 24 cm. • 440 _4 The Jossey-Bass higher education series • 500 Half title: Making sense of college grades : why the • grading system does not work and what can be done about it • 504 Includes bibliographical references (p. 271-280) and index. • 650 _0 Grading and marking (Students) • 650 _0 College credits. • 700 1_ Pollio, Howard R. • 700 1_ Eison, James A., ‡d 1950-

  34. Added entries colored plus an added title entry for title variation • 020 0875896871 • 050 00 LB2368 ‡b .M57 1986 • 100 1_ Milton, Ohmer. • 245 10 Making sense of college grades / ‡c Ohmer Milton, Howard R. Pollio, James A. Eison ; foreword by Laura Bornholdt. • 246 13 ‡aWhy the grading system does not work and what can be done about it • 250 1st ed. • 260 San Francisco : ‡b Jossey-Bass, ‡c 1986. • 300 xxii, 287 p. : ‡b ill. ; ‡c 24 cm. • 440 _4 The Jossey-Bass higher education series • 500 Half-title: Making sense of college grades : why the grading system does not work and what can be done about it. • 504 Includes bibliographical references (p. 271-280) and index. • 650 _0 Grading and marking (Students) • 650 _0 College credits. • 700 1_ Pollio, Howard R. • 700 1_ Eison, James A., ‡d 1950- MARC Record

  35. (“SIGNPOSTS”) DATA MARC TAGS fields • Main entry, personal name • with a single surname: • The name: Arnosky, Jim. 100 1# $a • 2. Title and Statement of • responsibility area, pick • up title for a title added • entry, file under "Ra..." • Title proper: Raccoons and • ripe corn 245 10 $a • Statement of responsibility: Jim Arnosky. $c • Edition area: • Edition statement: 1st ed. 250## $a • 4. Publication, distribution, • etc., area: • Place of publication: New York : 260## $a Name of publisher: Lothrop, Lee & • Shepard Books, $b Date of publication: c1987. $c

  36. "SIGNPOSTS“ DATA MARC TAGS • 5. Physical description area: • Pagination: 25 p. : 300## $a • Illustrative matter: col. ill. ; $b • Size: 26 cm. $c • 6. Note area: • Summary: Hungry raccoons feast • at night in a field of ripe corn. 520## $a • 7. Subject added entries, from Library of • Congress subject heading list for • children: • Topical subject: Raccoons. 650 #1 $a • 8. Local call number: 599.74 ARN 900## $a • 9. Local barcode number: 8009 901## $a • 10. Local price: $15.00 903## $a

  37. Some Rules • 1XXfields (main entries) • 4XXfields (series statements) • 6XXfields (subject headings) • 7XXfields (added entries other than subject or series) • 8XXfields (series added entries) • X00Personal names • X10Corporate names • X11Meeting names • X30Uniform titles • X40Bibliographic titles • X50Topical terms • X51Geographic names • For example, 610: subject heading that is a corporate name

  38. A MARC Authority Record • 010 __ ‡a n 97011884 • 035 __ ‡a (DLC)n 97011884 • 040 __ ‡a DLC ‡c DLC ‡d DLC • 100 1_ ‡a Bilal, Dania, ‡d 1956- • 400 1_ ‡a Meghabghab, Dania Bilal, ‡d 1956- • 670 __ ‡a Automating media centers and small libraries, 1997: ‡b CIP t.p. (Dania Bilal Meghabghab; Valdosta State Univ.) CIP data sheet (b. Sept. 14, 1956) • 670 __ ‡a Automating media centers and small libraries, 2002: ‡b CIP t.p. (Dania Bilal; Assistant Professor, School of Information Sciences, University of Tennessee-Knoxville)

  39. MARC Record in XML • <?xml version="1.0"?> • <record> • ...... • <datafield tag="100" ind1="1" ind2=""> • <subfield code="a">Milton, Ohmer. </subfield> </datafield> • <datafield tag="245" ind1="1" ind2="0"> • <subfield code="a"> Making sense of college grades / </subfield> • <subfield code="c">Ohmer Milton, Howard R. Pollio, James A. Eison; foreword by Laura Bornholdt. </subfield> • </datafield> • ...... • <datafield tag="700" ind1="1" ind2=""> • <subfield code="a">Eison, James A.</subfield> • <subfield code="d">1950- </subfield> • </datafield> • </record>

  40. Metadata Crosswalks • A crosswalk is a mapping of the elements, semantics, and syntax from one metadata scheme to those of another. For example, MARC to MARC XML MARC to/from MODS MARC to/from DC DC to/from MODS

  41. Dublin Core & MARC Mapping Details are still under development • 1. Title • 2. Creator • 3. Subject • 10. Identifier • 13. Relation 130; 245 $a, $b; 246; 740 (old records) 100; 110; 111; 245 $c; 700; 710 600 (personal); 610 (corporate); 650 (topic); 653 (place) 010 (LCCN); 020 (ISBN); 022 (ISSN); 856 (URL) 4xx (IsPartOf); 740 (HasPart) 530 (HasFormat)

  42. Exercise 2 • Metadata Records Objectives • to interpret cataloging records (content) • to encode using a coding scheme • to understand metadata crosswalk (mapping records between metadata schemes) • Record Analysis • For each of the two records shown in this assignment, identify the basic data elements following Chan’s example on pages 15 and 16. Make sure you identify the main entry and all the added entries (There is an error in the MARC record. Correct it accordingly.). Turn in your analysis. For the tags used in MARC record, visit MARC 21 at http://www.loc.gov/marc/bibliographic/ • MARC Encoding • Encode the card catalog record into MARC format. You may use $ for ‡. Some fields (tags) use indicators. All subfields codes should be appropriately used, but $a can be omitted. For semantics and syntax visit the site above.

  43. Exercise 2 • Metadata crosswalk • Map (translate) the cataloging records representing 2 works to Dublin Core records (ANSI/NISO Z39.85 standard) in HTML format including only the following DC elements: • DC.titleDC.creator [refine & scheme]DC.subject [refine & scheme]DC.publisherDC.contributorDC.dateDC.identifierDc.relation [refine] • For example,<meta name="DC.creator.personal" scheme="LCNAF" content=" "> • <meta name="DC.subject.topic" scheme="LCSH" content=" "> • <meta name="DC.relation.HasFormat" content=" "> • <meta name="DC.relation.IsPartOf" content=" ">

  44. Exercise 2 • What to turn in? • Your analysis similar to Chan's example with revision accordingly • One MARC record • Two Dublin Core records (selected elements) in HTML format

  45. III. Series Record #1 A Card Record

  46. Record #2 A MARC Record • [FIXED FIELDS OMITTED] • 070 0 ‡a HA202.S82 • 110 2 ‡a U.S. Census Bureau • 245 10 ‡a Statistical abstract of the United States / ‡c US Census Bureau. • 260 ‡a Washington, DC: ‡b G.P.O., ‡c 1879- • 300 ‡a v. : ‡b ill. ; ‡c 24cm. • 310 ‡a Annual • 362 ‡a 1st no. (1878)- • 500 ‡a “March 13, 1998.” • 530 ‡a Available on microfiche from W.S. Hein. • 530 ‡a Available on CD-ROM • 538 ‡a Mode of access: World Wide Web. • 651 0 ‡a United States ‡v Statistics ‡v Periodicals. • 856 41 ‡u http://purl.access.gpo.gov/GPO/LPS2878

  47. Impromptu Quiz • What is the difference between expressions and manifestations of the same work? • Why is metadata important for a library and/or information professional? • Identify a metadata scheme? • Why are there so many metadata schemes?

  48. Impromptu Quiz • What is semantics and syntax for a metadata scheme? • What are metadata crosswalks? • What are different formats for encoding metadata? • What processes are involved in cataloging procedures for creating bibliographic records?

  49. Impromptu Quiz • What are new trends related to metadata observable in contemporary IR systems? • What are implications of the following on metadata: • Distributed environments • Heterogeneous work • Networked systems • Electronic environment • Information settings • What is the importance of qualifiers (refinement, scheme, relation) in encoding?

  50. Critical Reflection 6 • You are provided an OPAC record. Create a MARC record for the same document. • You are provided a MARC record. Create an OPAC record for the same document. • You are provided a record from the card catalog, an OPAC record, and a MARC record for the same document. What differences are there in the ways that these three individually may provide access to the document? Which one do you most prefer? Which one do you least prefer? Why? • Share your thoughts about what you found most/least memorable in the above activities.