1 / 66

From ONIX to MARC and Back Again: New Frontiers in Metadata Creation at OCLC

For discussion today. The current models for library and publisher supply chain metadata creation and maintenance are not sustainable!We must move toward new paradigms for metadata creation and maintenance that include:Further interoperability and shared metadataMechanisms for allowing metadata t

eman
Download Presentation

From ONIX to MARC and Back Again: New Frontiers in Metadata Creation at OCLC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Renee Register Global Product Manager OCLC Cataloging & Metadata Services From ONIX to MARC and Back Again: New Frontiers in Metadata Creation at OCLC

    2. For discussion today The current models for library and publisher supply chain metadata creation and maintenance are not sustainable! We must move toward new paradigms for metadata creation and maintenance that include: Further interoperability and shared metadata Mechanisms for allowing metadata to “grow up” over time, sharing enhancements as they are made by both library and publishing communities LC Working Group on the Future of Bibliographic Control recommendation 1Increase the efficiency of bibliographic production and maintenance 1.1.1 Make more use of bibliographic data earlier in the supply chain However, library challenges will not be solved by reliance on upstream data alone. The publisher supply chain experiences challenges in metadata creation and management, too! We need to work together to increase efficiency, consistency and accuracy in bibliographic production and maintenance for both library and publisher marketsLC Working Group on the Future of Bibliographic Control recommendation 1Increase the efficiency of bibliographic production and maintenance 1.1.1 Make more use of bibliographic data earlier in the supply chain However, library challenges will not be solved by reliance on upstream data alone. The publisher supply chain experiences challenges in metadata creation and management, too! We need to work together to increase efficiency, consistency and accuracy in bibliographic production and maintenance for both library and publisher markets

    3. It all starts with publisher metadata – libraries, retailers, wholesalers and consumers make decisions based on this metadata

    4. Publishers create electronic metadata for use in their own tools and systems and also share metadata with supply chain partners

    5. Metadata originates with the publisher or material provider responsible for the content This metadata is used in various ways: Print catalogs, marketing and advertising Publisher websites Publisher inventory systems Publisher business intelligence Publisher data feeds to supply chain partners, etc. Significant investment in staff and systems is required to support publisher metadata needs Metadata originates with the publisher or material provider responsible for the content This metadata is used in various ways: Print catalogs, marketing and advertising Publisher websites Publisher inventory systems Publisher business intelligence Publisher data feeds to supply chain partners, etc. Significant investment in staff and systems is required to support publisher metadata needs

    6. The metadata ends up on publisher websites

    7. And publisher print or PDF catalogs as well as other publisher ordering/inventory systems and tools

    8. Publishers also share pre-publication electronic data with the Library of Congress for the creation of CIP records

    9. CIP Record (Publication Date April 2009)

    10. The same title on the publisher website

    11. The publisher supply chain also invests significant staff and resources in the creation, enhancement and manipulation of metadata Aggregators pull together metadata from multiple publishers and package it in ways that can be of use to materials providers, libraries and end users Searchable websites with features that make metadata more useable Data feeds that can be ingested into supplier systems All have staff who work on the metadata received from publishers and create metadata The publisher supply chain also invests significant staff and resources in the creation, enhancement and manipulation of metadata Aggregators pull together metadata from multiple publishers and package it in ways that can be of use to materials providers, libraries and end users Searchable websites with features that make metadata more useable Data feeds that can be ingested into supplier systems All have staff who work on the metadata received from publishers and create metadata

    12. The same title on the Barnes and Noble website

    13. The same title on Amazon

    14. Title data on Amazon

    15. And more Amazon metadata about this title (Note the inclusion of LCSH)

    16. Wholesalers build products and services on publisher metadata

    17. Wholesaler ordering tools pull information about content from multiple publishers into one interface

    18. Data Aggregators also collect and enhance metadata from multiple publishers

    19. The metadata can be combined with business data to assist with buying decisions

    20. Libraries use these wholesaler and aggregator tools built on publisher metadata

    21. Retailers use metadata from wholesalers and aggregators too

    22. Buying decisions (for retailers and libraries) incorporate business intelligence connected to title metadata (Source: Nielsen BookData)

    23. Libraries and retailers use metadata sliced and diced using various metadata elements (Source: Nielsen BookData)

    24. Categories are an important part of how we sort and make meaning from metadata (Source Nielsen BookData)

    25. Publisher Supply Chain Subjects BISAC Subject Headings

    26. Publisher Supply Chain Subjects: BISAC Subject Headings

    27. Publisher Supply Chain Subjects: BISAC Major Categories

    28. Publisher Supply Chain Subjects: BIC Standard Subject Categories (U.K.)

    29. ONIX is the international standard for the book industry

    30. And the book industry encourages best practices for metadata creation

    31. From BISG “Best Practices” document

    32. But library systems require electronic metadata in MARC format, library terminologies and library-defined input standards

    33. So, many library wholesalers maintain separate databases and cataloging staff to provide MARC to library customers

    34. The Library of Congress transforms pre-publication metadata into MARC and adds library-specific metadata to create CIP records Library catalogers retrieve MARC records from LC and other shared resources and create MARC records – usually upon receipt of materials Library vendors often employ catalogers (in addition to other data staff described earlier) to create MARC records and physical processing staff to perform shelf-ready services These vendors must maintain (at least) two databases to accommodate different data formats and customer needs The Library of Congress transforms pre-publication metadata into MARC and adds library-specific metadata to create CIP records Library catalogers retrieve MARC records from LC and other shared resources and create MARC records – usually upon receipt of materials Library vendors often employ catalogers (in addition to other data staff described earlier) to create MARC records and physical processing staff to perform shelf-ready services These vendors must maintain (at least) two databases to accommodate different data formats and customer needs

    35. Data Silos Libraries, retailers, wholesalers and aggregators are consumers of publisher direct and publisher supply chain metadata Parts of the publisher supply chain also use and create library metadata But library metadata has evolved separately from publisher supply chain metadata

    36. Putting It Back Together

    37. New Models for Creating and Sharing Metadata Re-mix and re-use existing metadata Increase collaboration and cooperation between library and publisher supply-chain communities Break down barriers between metadata used for selection and acquisitions and metadata used for cataloging, discovery, business intelligence and collection management Become more involved in upstream metadata creation processes, integrate available metadata into workflows upstream and allow the metadata to evolve over time

    38. New Models for Creating and Sharing Metadata Solutions must be interoperable and easily shared – inside and outside the library community The library community must extend our expertise, as well as our cooperative and collaborative practices, to include publishers and publisher supply chain partners

    39. “Next Generation” Cataloging and Metadata Creation Pilot Automated capture, crosswalk and enhancement of publisher ONIX metadata Output in MARC and ONIX to benefit both library and publishing communities ONIX enriches MARC data and MARC enriches ONIX data Mapping between library and publisher terminologies OCLC pilot program with publishers, vendors and libraries

    40. “Next Generation” Cataloging and Metadata Creation Pilot Components of the “Next Gen” process ONIX to MARC crosswalk MARC to ONIX crosswalk Automated record build and add to WorldCat Enrichment software: rules and hierarchies for data mining and record enrichment using FRBR work sets Mapping between terminologies: first up – DDC/BISAC Subject Headings Output files in either MARC or ONIX

    41. “Next Generation” Data Flow

    42. How are we doing?

    43. ONIX to MARC Crosswalk

    44. Example of ONIX Input

    45. OCLC’s ONIX/MARC Mapping

    46. OCLC’s ONIX/MARC Mapping

    47. Matching in WorldCat and Data Enrichment

    48. Example of Enriched ONIX

    49. Enrich Existing WorldCat Record

    50. Example of Enrichment to Existing WorldCat Record

    51. Example of Enrichment to Existing WorldCat Record (Cont.)

    52. MARC to ONIX Crosswalk

    53. Example of Enriched ONIX Output

    54. Add New Records to WorldCat and Enrich New Records

    55. We can build a basic MARC format record by mapping ONIX to MARC But we want to automatically make that record better and more fit for use by libraries by mining existing WorldCat records Many forthcoming or newly published titles are new iterations of existing works New editions, paperbacks, audio books, e-books, large print, etc. We want to make our process “think like a cataloger” for these types of titles and use the metadata in WorldCat records for earlier versions Earlier editions, hardcover editions, etc.

    56. We will do this by mining FRBR work set records for newly added titles that link to an existing work set The new record will be a hybrid of: Publisher ONIX metadata that pertains specifically to the new work ISBN, physical description, imprint, price, etc. And … Existing MARC data that pertains to the same intellectual content LC and Dewey Classification, LCSH, NLM, authority controlled contributor names, etc.

    57. Example of “Next Gen” New Record Created from ONIX Data and Mining FRBR Work Set

    58. Map Between Terminologies: DDC/BISAC Subject Headings

    59. Example of DDC/BISAC Mapping

    60. Example of DDC/BISAC Mapping (Cont.)

    61. Planned Enhancements: Additional terminologies mapping Use WorldCat Identities in enrichment process

    62. “Next Generation” Cataloging and Metadata Creation Pilot Pilot Wrap-up Spring 2009 Complete statistical analysis and compile pilot partner evaluation results Complete case studies of pilot partners Share results with pilot partners, advisory board Share results with library and publisher supply chain communities Watch this space for pilot results: http://www.oclc.org/us/en/partnerships/material/nexgen/nextgencataloging.htm

    63. “Next Generation” Cataloging and Metadata Creation: Beyond the Pilot “Productionize” the process so that it can used routinely to ingest, create and enhance metadata in WorldCat and output enhanced data in MARC and ONIX Add additional mappings between classification schema and terminologies

    64. “Next Generation” Cataloging and Metadata Creation: Beyond the Pilot Continue to refine enrichment of existing WorldCat records and the addition of records to WorldCat based on publisher data Integrate WorldCat Identities into the process Enhance the process based on response from library and publisher communities Include mechanisms for ongoing delivery of record enhancements to libraries and publishers

    65. Contact Information Renee Register register@oclc.org 614-764-6107 Maureen Huss hussm@oclc.org 614-764-4327

    66. Questions? Questions?

More Related