1 / 23

A Library Science Perspective on Digitization

A Library Science Perspective on Digitization. Bryan Heidorn University of Arizona. Library-Museum Parallels. Intellectual Property Rights Physical /Digital Objects Sharing Descriptive Metadata Formats Preservation Metadata Transport Metadata Formats Communication Protocols (no so much)

phiala
Download Presentation

A Library Science Perspective on Digitization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Library Science Perspective on Digitization Bryan HeidornUniversity of Arizona

  2. Library-Museum Parallels • Intellectual Property Rights • Physical/Digital Objects Sharing • Descriptive Metadata Formats • Preservation Metadata • Transport Metadata Formats • Communication Protocols (no so much) • Similar Digitization Workflow • OCR Challenges

  3. Intellectual Property Rights • Expanded to 75yrs in US from 25 • Academic Publishing anomalies • Attribution required (data no so much) • Decoupling of Data from Text

  4. Online Computer Library Center (OCLC) • Collaborative Automation of libraries including copy cataloging • Started 1967 • Catalog 271 million items/year • 72,000 libraries in 170 countries and territories use OCLC services to locate, acquire, catalog, lend and preserve library materials.

  5. Descriptive Metadata Formats • MARC(XML) 21 Standard • METS • Dublin Core (Interchange Format only)

  6. Biodiversity Heritage Library Workflow Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries

  7. MARC 21 Standard • Formats: Bibliographic, Authority, Holdings, Classification, Community • Bibliographic Material Types: • Books (BK) • Continuing resources (CR) • Computer files (CF) • Maps (MP) • Music (MU) • Visual materials (VM) • Mixed materials (MX) http://www.loc.gov/marc/

  8. MARC Fields • 00X: Control Fields • 01X-09X: Numbers and Code Fields • Heading Fields - General Information • 1XX: Main Entry Fields • 20X-24X: Title and Title-Related Fields • 25X-28X: Edition, Imprint, Etc. Fields • 3XX: Physical Description, Etc. Fields • 4XX: Series Statement Fields • 5XX: Note Fields • 6XX: Subject Access Fields • 70X-75X: Added Entry Fields • 76X-78X: Linking Entry Fields • 80X-83X: Series Added Entry Fields • 841-88X: Holdings, Location, Alternate Graphics, Etc. Fields

  9. MARC Book Example eader/00-23 *****nam##22*****#a#4500 001 <control number> 003 <control number identifier> 005 19920331092212.7 007/00-01 ta 008/00-39 820305s1991####nyu###########001#0#eng## 020 ##$a0845348116 :$c$29.95 (£19.50 U.K.) 020 ##$a0845348205 (pbk.) 040 ##$a[organizationcode]$c[organization code] 050 14$aPN1992.8.S4$bT47 1991 082 04$a791.45/75/0973$219 100 1#$aTerrace, Vincent,$d1948- 245 10$aFifty years of television :$ba guide to series and pilots, 1937-1988 /$cVincent Terrace. 246 1#$a50 years of television 260 ##$aNew York :$bCornwall Books,$cc1991. 300 ##$a864 p. ;$c24 cm. 500 ##$aIncludes index. 650 #0$aTelevision pilot programs$zUnitedStates$vCatalogs. 650 #0$aTelevision serials$zUnitedStates$vCatalogs.

  10. Difference between Museum and Library • Full Darwin code has parallels in MARC • Many more commercial and custom products • Larger installed base • Library Entries somewhat more detailed • There is a MARC(XML) and MARC Lite • MARC differentiates among material types

  11. Digital Content Transport • METS – Metadata Encoding and Transmission Standard • The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language.

  12. Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries

  13. METS Components • METS Header • Descriptive Metadata • Administrative Metadata • File Section - The file section lists all files containing content which comprise the electronic versions of the digital object. <file> elements may be grouped within <fileGrp> elements, to provide for subdividing the files by object version. • Structural Map • Structural Links • Behavior

  14. I/O • Submission Information Package (SIP), which is sent from the information producer to the archive; • the Archive Information Package (AIP), which is the information package actually stored by the archive; and • the Dissemination Information Package (DIP), which is the information package transferred from the archive in response to a request by a consumer.

  15. Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries

  16. Open Archives Initiative Protocol for Metadata Harvesting • The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

  17. OAI Verbs • Get • Identify • ListIdentifiers • ListMetadataFormats • ListRecords • ListSets

  18. Get • http://arXiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc

  19. <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46Z</responseDate> <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv.org:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title> <dc:creator>Dushay, Naomi</dc:creator> <dc:subject>Digital Libraries</dc:subject> <dc:description>With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. </dc:description> <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description> <dc:date>2001-12-14</dc:date> </oai_dc:dc> </metadata> </record> </GetRecord> </OAI-PMH>

  20. Metadata Collection and Workflow (Macaw)

  21. Physical/Digital Objects Sharing • Books both part of an Edition and Unique • 20th century books have standard front matter • LMS contained Metadata Only • Journals indexed by article • Most digital content is commercially owned and born digital • 2011 author-publishing exceeded commercial • Born analog digitization (Google Books and BHL)

  22. Governance • Libraries pay for OCLC • OCLC is Participatory • Close Collaboration with Library of Congress on Standards • School System exists to train librarians • Libraries are being cut in academic, public and school sectors

More Related