1 / 28

Toward a post-MARC view of bibliographic metadata

Toward a post-MARC view of bibliographic metadata. Jean Godby , Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina March 15, 2012. Outline for today. How did I get to this place?

lavonn
Download Presentation

Toward a post-MARC view of bibliographic metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward a post-MARC view of bibliographic metadata Jean Godby, Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina March 15, 2012

  2. Outline for today • How did I get to this place? • The Library of Congress Bibliographic Framework for Digital Resources • The OCLC ‘Beyond MARC’ work agenda • Four guiding assumptions • Some questions

  3. Translations in the Crosswalk service Inputs Outputs OCLC MARC OCLC MARC DC-Qualified DC-Qualified OCLC MARC ONIX Books 2.1 ONIX Books 2.1 ONIX Books 3.0 ONIX Books 3.0 MODS MODS Dublin Core Dublin Core MARC MARC

  4. Problems with mapping to and from MARC Problem: In a MARC record, some critical information is represented redundantly. Effect on the Crosswalk: requires one-to-many mappings, which are semantically opaque and difficult to maintain. Problem: Some MARC fields are ambiguous. Effect on the Crosswalk: The distinctions are difficult to recover or may be lost. Problem: Many MARC free-text fields have formatting requirements. Effect on the Crosswalk: They must be added in (and taken out).

  5. And so forth….and so on Problem: Many formatting requirements are explicitly stated only in cataloging rules, not in the data that is algorithmically processed. Effect on the Crosswalk: Knowledge of the cataloging rules must be embedded in the translation software. Problem: Some MARC fields are coded with hidden assumptions. Effect on the Crosswalk: Knowledge of the hidden assumptions must be embedded in the translation software, which requires complex and brittle Boolean logic. Problem: MARC has a “long tail.” Effect on the Crosswalk: It is necessary to maintain a large number of mappings that are not used.

  6. MARC’s complexity needs to be quarantined. Inputs Outputs OCLC MARC OCLC MARC DC-Qualified DC-Qualified RDA or other structured metadata vocabulary ONIX Books 2.1 ONIX Books 2.1 ONIX Books 3.0 ONIX Books 3.0 MODS MODS Dublin Core Dublin Core MARC MARC

  7. In other words, with MARC in the center of our model… Despite the hundreds of millions of mappings that have been performed on OCLC’s bibliographic data, it is still locked up in a legacy system. The mapping problem is complex largely because of the need to support MARC. It is still too difficult to define and implement mappings. So what is the alternative?

  8. “Bibliographic framework is... an environment rather than a ‘format’” “The new bibliographic framework we are aiming for will broaden participation in the network of resources, librarians will be able to do a much better job of linking their patrons to resources of all kinds (from the library and from many other sources), and costs can be better contained.” -- Library of Congress A Bibliographic Framework for the Digital Age (October 31, 2011)

  9. OCLC’s ‘Beyond MARC’ research agenda theme data semantic identifier RDF object library RDA legacy instance UML content transformation service role entity abstract property carrier format hadoop manifestation statement web linked groundtruthing schema beyond relationship model FRBR authority resource MARC description

  10. The OCLC “Beyond MARC: research agenda:who’s involved Eric Childress, Consulting Product Manager Jean Godby, Senior Research Scientist Thom Hickey, Chief Scientist Devon Smith, Consulting Software Engineer Karen Smith-Yoshimura, Program Officer Roy Tennant, Senior Program Officer Diane Vizine-Goetz, Senior Research Scientist Jeff Young, Software Architect

  11. Assumption 1 There are many moving targets

  12. The OCLC Research response: Some guiding principles • Don’t add to the complexity. • Use publicly defined standards wherever possible. • Leverage the work of others. • Focus on data preparation, cleanup, and modeling that will support a variety of formats.

  13. Data preparation: principles Make your stuff available on the web. Make it available as structured data… …in a non-proprietary format. Use URLs to identify things. Link your data to other people’s data. Source: W3C Data, not text Identifiers, not strings Statements, not records Machine-readable schema Machine-readable lists Source: Karen Coyle

  14. Assumption 2: Most bibliographic metadata will not be created by libraries

  15. A record Why ONIX is interesting identifier <Product> <RecordReference>0892962844</> <ProductIdentifier> <ProductIDType>02</> <IDValue>0892962852</> </ProductIdentifier> <ProductForm>BB</> <Title> <TitleType>01</> <TitleText>McBain’s Ladies</> </Title> <Contributor> <ContributorRole>A01</> <PersonNameInverted>Hunter, Evan</> </Contributor> <Subject> <SubjectSchemeIdentifier>02</> <SubjectHeadingText> Policewomen--Fiction. string Leader00000 jm a22000005 4500 008 g eng 020$a0892962852 100 $aHunter, Evan 245 $aMcBain’s ladies 260 $bMysterious Press$d1988 300 $a320 p. 650 #2$aPolicewomen -- Fiction identifier A statement data text identifier data string data string

  16. A hypothetical bibliographic description expressed as linked data <Product> <RecordReference> http://uri/recordID/0892962844</> <ProductIdentifier> http://uri/identifierisbn0892962852</> </ProductIdentifier> <ProductForm>http://uri:/format/paperback</> <Title> http://uri/title/primaryTitle/McBain’sLadies</> </Title> <Contributor> <ContributorRole>A01</> <Person> http://uri/person/Hunter, Evan</> </Contributor> <Subject> http://uri/subject/LCSH/Policewomen--Fiction.

  17. This list is inadequate for describing the range of material types held by libraries.

  18. Some proposed “library” extensions to Schema.org.

  19. The extensions are derived from MARC data for the WorldCat search interface.

  20. The WorldCat search interface terms reduce a complex MARC concept space to a list.

  21. Assumption 3:MARC will be around for awhile. Assumption 4:Mapping is still necessary.

  22. A publishing model Inputs Outputs OCLC MARC OCLC MARC Standard Vocabularies DC-Qualified DC-Qualified Raw Data OCLC Abstract Model RDA or other structured metadata vocabulary model map ONIX Books 2.1 ONIX Books 2.1 ONIX Books 3.0 ONIX Books 3.0 model map MODS MODS model map Dublin Core Dublin Core MARC MARC

  23. The concepts must be extracted. It is not enough To RDF-ify MARC Theyeventually emerge.

  24. Some (perhaps uncomfortable) questions • How much work will be involved in building out the abstract model? What is the value proposition? • How can we engage communities of practice to contribute to the parts of the abstract model that describe their resources? • How will mappings be implemented in the post-MARC information landscape? • How much information in the MARC record will get lost? • What will content standards look like in post-MARC descriptions? • How many of the FRBR and RDA concepts are algorithmically recoverable from legacy data? • What happens if linked data does not live up to its promise or is not adopted quickly enough?

  25. Set-theoretic mappings can be implemented elegantly in RDF/OWL. But maps from many MARC concepts look like this.

  26. References Coyle, Karen. 2011. MARC 21 as data: a start http://journal.code4lib.org/articles/5468 ---.2012. Taking library data from here to there. http://lists.w3.org/Archives/Public/public-esw-thes/2012Feb/0001.html Godby, Carol Jean. 2010. From records to streams: merging library and publisher metadata. http://dcpapers.dublincore.org/ojs/pubs/article/view/1033. Library of Congress. 2011. A bibliographic framework for the digital age. http://www.loc.gov/marc/transition/news/framework-103111.html Library Linked Data Incubator Group final report. 2011. http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ OCLC. 2012. FAST Linked Data. http://experimental.worldcat.org/fast/. Schema.org. 2012 http://schema.org/ Smith-Yoshimura, Karen, et al. 2010. Implications of MARC tag usage on library metadata practices. http://www.oclc.org/research/publications/library/2010/2010-06.pdf

  27. Thank you!

More Related