210 likes | 217 Views
CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu. Lecture 12 Why metadata?. Notes. Carl Lagoze on Wednesday No Lab on Friday But Paul Ginsparg on 04/03 XML Schema & XSLT - later.
E N D
CS 502 Computing Methods for Digital LibrariesCornell University – Computer ScienceHerbert Van de Sompelherbertv@cs.cornell.edu Lecture 12 Why metadata?
Notes • Carl Lagoze on Wednesday • No Lab on Friday • But Paul Ginsparg on 04/03 • XML Schema & XSLT - later
Content – Data - Metadata Content refers to digital library materials as information that is of interest to a user. Data emphasize bits and bytes to be processed by a computer. Metadata : data about data
Metadata – focus on description/discovery • data about data • origins in library cataloguing, A&I databases • now: an amplification of traditional bibliographic cataloguing practices in an electronic environment; • now: any data used to aid the identification, description and location of networked electronic resources. • actually, it is more
Metadata - broader • descriptive: facilitating resource discovery and identification (record in OPAC system) • administrative: facilitating resource management within a collection (loan record in OPAC system) • structural: binding together the components of complex information objects (series title in record in OPAC system)
descriptive/discovery administrative structural library objects networked resources Metadata - evolution descriptive/discovery library objects
Metadata • Traditionally stored separately from the objects that it describes, • For digital objects, sometimes is embedded in the objects (cf. KWF). • Usually the metadata is a set of text fields. • Textual metadata can be used to describe non-textual objects, e.g., software, images, music, …
Metadata – why? Some methods of information discovery search descriptive metadata about the objects. Generally, it enables digital library services: • explicitly (discovery metadata) or implicitly (terms and conditions) • helps to impose order on chaos • enables automated discovery/manipulation of objects
Metadata – generation (traditional) cataloguing rules object metadata record reference data
Metadata – generation (traditional) • Advantage: • Human expertise leads to high-quality catalogs and indexes • Disadvantages: • Expensive ($50+ per record) • Time consuming • Requires cumbersome cataloguing rules • Slow to adapt to new formats and types of digital objects • Human cataloging and indexing is too expensive to apply to all but a small proportion of digital objects • => automatic generation of metadata
Metadata – roots (Library cataloguing) Anglo American Cataloguing Rules (AACR2) • rules for what goes into each field of a catalog record MARC format • an exchange format for catalog records "MARC Catalog" • catalog in MARC format, where content of each field follows AACR2
Citation: a monograph -- book! • Citation • Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.
MARC tags MARC field MARC subfield code MARC subfield MARC indicator
ISBN Title statement • Imprint – • location, • publisher, • year Collation Series Title
directory leader field terminator 001 field
MARC: the good news • A great achievement: • Developed in 1960s • Magnetic tape exchange format for printing catalog records • The dawn of computing: • mixed upper and lower case • variable length fields, • repeated fields • non-Roman scripts • 100(?) million records with standard content and format • Thousands of trained librarians (millions?)
MARC: the bad news • A great problem: • Not designed for computer algorithms • One record per item (poor links between records) • Tied to traditional materials and traditional practices • Not Unicode • 100 million records at $50+/record • A classic legacy system!
Metadata –- simplicity/complexity • Variety of metadata formats for description/discovery: • basic, proprietary, records used in global internet search services; • simple attribute/value records such as the ROADS templates used in eLib subject services; • unqualified Dublin Core (12 elements only) • the more structured TEI and MARC formats; • qualified Dublin Core • detailed formats such as CIMI and EAD, typically applied to archival material.
Metadata –- one-size-fits-all/application-profiles • There is an evolution from a “one size fits all” concept for metadata towards: • the use of a specific format depending on the purpose; • the co-existence of formats in relation to an object; • combining metadata elements from various formats; • Choice of format can depend on: • the functional purpose of the metadata –- [description/ discovery/location] ; [administration] ; [structuring] • level of detail required to fulfill the purpose • discipline/domain/audience of the objects that are described • legacy issues • interoperability requirements
Commerce Home Pages Geo Library Internet Commons Scientific Data Whatever... Museums Metadata – interoperability
Metadata – descriptive/other • There is an evolution towards the creation of standards for non-discovery related metadata formats: • Preservation metadata [NedLib, CEDARS, …] (see OCLC overview document - http://www.oclc.org/digitalpreservation/presmeta_wp.pdf • Data Dictionary for Technical Metadata for Digital Still Images“ (http://www.niso.org/pdfs/DataDict.pdf) • book e-commerce [ONIX] • resource administration: • Circulation Interchange Protocol (NCIP) Standard – see http://www.niso.org/drafts/Z3982v1.html • Electronic resources (cf. Adam Chandler)