1 / 37

Digitization with Millennium & CONTENTdm

Digitization with Millennium & CONTENTdm. Stuart Hunt IUG17 Anaheim May 2009. Overview. Background Digitisation Metadata Workflows Now. University of Warwick. Royal Charter 1965 Russell Group 16,000 FTE students 5000 staff. University Library. Approx 1.1 million volumes

chavi
Download Presentation

Digitization with Millennium & CONTENTdm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digitization with Millennium & CONTENTdm Stuart Hunt IUG17 Anaheim May 2009

  2. Overview • Background • Digitisation • Metadata • Workflows • Now

  3. University of Warwick • Royal Charter 1965 • Russell Group • 16,000 FTE students • 5000 staff

  4. University Library • Approx 1.1 million volumes • 170 staff (110 FTE) • Millennium 2003 • Approx 100,000 issues/renewals per yr • Approx 28,000 new books per yr • RLUK member • OCLC member

  5. Content • Marandet Collection • 4000+ French plays 1720 to 1900 • Acquired 1970s • Guide published 1979 • Bibliographic records in Millennium, RLUK, COPAC, & WorldCat • No IPR issues

  6. Projects • Revolutionary Drama (1789-1800) • 339 plays • Empire Period Drama (1801-1815) • 123 plays • JISC Digitisation Programme: Enriching Digital Resources • ‘Exposing Marandet’ • 1500 plays/75,000 pages

  7. Objectives • Cross-searching • Full-text searching • Integration with existing & future systems • Millennium • Web • Vertical search solution

  8. Options • Existing solutions • Millennium • In-house web publishing tool • Separate product • Digital collection management software • CONTENTdm • Solution would drive approach taken

  9. Digital production • Image files • TIFF & JPEG derivative • Full colour & greyscale • Outsourced • Text files/full-text transcripts • OCR quality initially not acceptable • Re-keying • Outsourced

  10. Media Management • Tried & tested solution • Quick & easy • Link digital content • D2D process simplified • Existing bibs • New bibs • Use existing authentication if required

  11. Media Management • No full-text searching • No cross-collection searching (unless in separate scope) • Tied to MARC metadata • Metadata enrichment difficult • Image file format • Not a total solution

  12. CONTENTdm • Full-text & cross-collection searching • Not tied to MARC metadata • Metadata enrichment simple • Local Windows server • Initial licence <50K images • Upgraded to unlimited licence 2008

  13. Local metadata context • Separate bibs • Print vs electronic • Describes what is • Supports better (future) FRBRisation • Ease of maintenance • Location & format based scoping • 793 for local added entry/uniform title • Collection name

  14. Metadata option 1 • Create metadata within CONTENTdm • Play-by-play • Metadata already present in Millennium

  15. Metadata option 1 • Assumes that metadata is already available • Not scalable • Poor use of resources • Does not allow data to work harder or smarter

  16. Metadata option 2 • Create metadata outside of Millennium • Metadata not already present in Millennium • Play-by-play • Harvest from CONTENTdm into Millennium via XML Harvester

  17. XML Harvester • Single configuration file • Needs to be edited for each separate resource • Uses XSLT not load table(s) • Major changes (e.g. harvest different schema) may need to be done by III

  18. Configuration file triggers @XML_TYPE=DC (or MARCXML) @OAI_FORMAT=oai_dc @DBNAME=[Repository name] @URL=[url for OAI-PMH] @USEOAI=true (or false) @OAISET=[Name of set] @RECID_MARCTAG=001

  19. XML Harvester

  20. Harvested metadata • Loaded through Data Exchange • Significant re-editing • Tags & indicators • Diacritics • Creating attached items or holdings records

  21. Harvested metadata

  22. Metadata option 3 • Batchloadinto CONTENTdm via delimited file from Create Lists • Cross-walk MARC21 to DC • Directory structure

  23. Record# dc:identifier 008/07-10 dc:language 100 dc:creator 245 dc:title 260|ab dc:publisher 260|c dc:date 300 dc:format 5XX dc:description 6XX dc:subject 700 dc:contributor 700|t dc:relation 793 dc:source MARC to Simple DC crosswalk

  24. MARC – DC Crosswalk

  25. Additional DC elements • dc:rights • dc:type • Transcript mapped to dc:description

  26. Metadata workflow • Create separate bibs for e-versions • Export print records via Data Exchange • MarcEdit to remove extraneous tags (907, etc) • Insert 006, 007, 008/23, GMD, 533 • Re-import into Millennium as new bibs • [856 CONTENTdm reference urladded]

  27. Metadata workflow • Review file of newly loaded bibs exported from Create Lists • Cross-walked from MARC to DC • Additional DC elements added • Item level metadata added • Loaded to CDM as delimited files with directory structure

  28. Metadata in CONTENTdm • Compound objects • Document level • Page level • Less rich than document level • Hospitable to multiple schemas • Deliberate attempt to stay close to DC • Administrative metadata • Later feature

  29. Document level • AACR in DC wrapper • All descriptive metadata from bib (except LDR, 006, 007, 008, GMD) • Authority control (names, subjects, uniform titles) • Rights (dc:rights) • Identifier (.b number) • Mapped to DC for OAI harvesting

  30. Page level • Basic descriptive metadata (creator, title, publisher, date) • Rights (dc:rights) • Identifier (.b number) • Transcript (dc:description) • No OAI harvesting at page level • Local decision

  31. Access & availability • Availability across local → global continuum • Metadata contribution • Collection level descriptions • OAI • Collapse D2D

  32. Metadata in WorldCat • Local CDM server – not able to use Connexion Digital Import • Bug between WorldCat and CDM for compound objects • FRBRized display in worldcat.org potentially impedes discovery

  33. Now • ‘Exposing Marandet’ completes 9/2009 • Established service 4 collections • Ancien Régime Drama • Revolutionary Drama • Empire Period Drama • Restoration Drama • Integration with course delivery • Metadata enrichment to/from CÉSAR

  34. Links • http://go.warwick.ac.uk/fac/arts/french/marandet/ • http://www.jisc.ac.uk/whatwedo/programmes/digitisation/enrichingdigi/marandet.aspx • http://webcat.warwick.ac.uk • http://contentdm.warwick.ac.uk

  35. stuart.hunt@warwick.ac.uk

More Related