1 / 21

Lecture 12 Why metadata?

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Lecture 12 Why metadata?. Notes. Carl Lagoze on Wednesday No Lab on Friday But Paul Ginsparg on 04/03 XML Schema & XSLT - later.

mfiecke
Download Presentation

Lecture 12 Why metadata?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Lecture 12 Why metadata?

  2. Notes • Carl Lagoze on Wednesday • No Lab on Friday • But Paul Ginsparg on 04/03 • XML Schema & XSLT - later

  3. Content – Data - Metadata Content refers to digital library materials as information that is of interest to a user. Data emphasize bits and bytes to be processed by a computer. Metadata : data about data

  4. Metadata – focus on description/discovery • data about data • origins in library cataloguing, A&I databases • now: an amplification of traditional bibliographic cataloguing practices in an electronic environment; • now: any data used to aid the identification, description and location of networked electronic resources. • actually, it is more

  5. Metadata - broader • descriptive: facilitating resource discovery and identification (record in OPAC system) • administrative: facilitating resource management within a collection (loan record in OPAC system) • structural: binding together the components of complex information objects (series title in record in OPAC system)

  6. descriptive/discovery administrative structural library objects networked resources Metadata - evolution descriptive/discovery library objects

  7. Metadata • Traditionally stored separately from the objects that it describes, • For digital objects, sometimes is embedded in the objects (cf. KWF). • Usually the metadata is a set of text fields. • Textual metadata can be used to describe non-textual objects, e.g., software, images, music, …

  8. Metadata – why? Some methods of information discovery search descriptive metadata about the objects. Generally, it enables digital library services: • explicitly (discovery metadata) or implicitly (terms and conditions) • helps to impose order on chaos • enables automated discovery/manipulation of objects

  9. Metadata – generation (traditional) cataloguing rules object metadata record reference data

  10. Metadata – generation (traditional) • Advantage: • Human expertise leads to high-quality catalogs and indexes • Disadvantages: • Expensive ($50+ per record) • Time consuming • Requires cumbersome cataloguing rules • Slow to adapt to new formats and types of digital objects • Human cataloging and indexing is too expensive to apply to all but a small proportion of digital objects • => automatic generation of metadata

  11. Metadata – roots (Library cataloguing) Anglo American Cataloguing Rules (AACR2) • rules for what goes into each field of a catalog record MARC format • an exchange format for catalog records "MARC Catalog" • catalog in MARC format, where content of each field follows AACR2

  12. Citation: a monograph -- book! • Citation • Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.

  13. MARC tags MARC field MARC subfield code MARC subfield MARC indicator

  14. ISBN Title statement • Imprint – • location, • publisher, • year Collation Series Title

  15. directory leader field terminator 001 field

  16. MARC: the good news • A great achievement: • Developed in 1960s • Magnetic tape exchange format for printing catalog records • The dawn of computing: • mixed upper and lower case • variable length fields, • repeated fields • non-Roman scripts • 100(?) million records with standard content and format • Thousands of trained librarians (millions?)

  17. MARC: the bad news • A great problem: • Not designed for computer algorithms • One record per item (poor links between records) • Tied to traditional materials and traditional practices • Not Unicode • 100 million records at $50+/record • A classic legacy system!

  18. Metadata –- simplicity/complexity • Variety of metadata formats for description/discovery: • basic, proprietary, records used in global internet search services; • simple attribute/value records such as the ROADS templates used in eLib subject services; • unqualified Dublin Core (12 elements only) • the more structured TEI and MARC formats; • qualified Dublin Core • detailed formats such as CIMI and EAD, typically applied to archival material.

  19. Metadata –- one-size-fits-all/application-profiles • There is an evolution from a “one size fits all” concept for metadata towards: • the use of a specific format depending on the purpose; • the co-existence of formats in relation to an object; • combining metadata elements from various formats; • Choice of format can depend on: • the functional purpose of the metadata –- [description/ discovery/location] ; [administration] ; [structuring] • level of detail required to fulfill the purpose • discipline/domain/audience of the objects that are described • legacy issues • interoperability requirements

  20. Commerce Home Pages Geo Library Internet Commons Scientific Data Whatever... Museums Metadata – interoperability

  21. Metadata – descriptive/other • There is an evolution towards the creation of standards for non-discovery related metadata formats: • Preservation metadata [NedLib, CEDARS, …] (see OCLC overview document - http://www.oclc.org/digitalpreservation/presmeta_wp.pdf • Data Dictionary for Technical Metadata for Digital Still Images“ (http://www.niso.org/pdfs/DataDict.pdf) • book e-commerce [ONIX] • resource administration: • Circulation Interchange Protocol (NCIP) Standard – see http://www.niso.org/drafts/Z3982v1.html • Electronic resources (cf. Adam Chandler)

More Related