1 / 22

METS Case Study: The NYU Digital Library Team

METS Case Study: The NYU Digital Library Team. METS Opening Day 27 October, 2003 Leslie Myrick. Projects at NYU using METS. EAD Finding Aid Project Tokyo Tribunal Proceedings Afghanistan Digital Library CRL Political Web Archiving Project DRAM * Hemispheric Institute *

palani
Download Presentation

METS Case Study: The NYU Digital Library Team

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick

  2. Projects at NYU using METS • EAD Finding Aid Project • Tokyo Tribunal Proceedings • Afghanistan Digital Library • CRL Political Web Archiving Project • DRAM * • Hemispheric Institute * • REPO History Sign Project *

  3. WHY METS? (1) METS was formulated to serve as a: Submission Information Package Archival Information Package Dissemination Information Package

  4. Why METS? (2) In other words, it’s a … Transfer Syntax Archival Syntax Functional Syntax

  5. METS and Complex Digital Objects • Finding aid + images with multiple scans/versions • Page turner for photo albums, documents, books – Edisto Album, Tokyo Tribunal brief, Afghanistan Digital Library • Multimedia/Time-Based Media Navigators: Hemispheric Institute; SMIL Viewer • Web Site Navigator – CRL Political Communications Web Archiving Project

  6. Using METS as a SIP • Berol Collection Finding Aid -- in negotiations with RLG Cultural Materials Project • METS will be bundled with objects; EAD

  7. METS as a Functional Syntax • METS designed not only for transfer and archival management, but for giving access to, navigating an object • METS + XSLT can create dynamic interfaces with links to resources and their metadata • METS can be dumped into Oracle, indexed and searched using context-aware queries.

  8. METS Plays Well With Others We have … • EAD Finding Aids pointing to METS • METS pointing to Finding Aids and marcxml records • METS pointing to and manipulating TEI

  9. METS and Extensions at NYU • MODS and DC for descriptive • MIX for Images/technical • textMD for text/technical • LC A/V Prototype + smptetechMD + AES • Missing Links: overall Preservation Schema plugin (PREMIS); rights MD schema

  10. Ingredients (so far) • Perl • MySQL and some Oracle • Tomcat • Servlets and jsp • Saxon and XT • XSLT

  11. Tools for Creation • zeroDB Database Input via interface as well as batch loading of metadata extracted by scripts e.g. ImageMagick identify, arcscraper.pl Outputs METS using Perl DBI

  12. Tools for Dissemination • Page-turners • Multimedia Viewers • Thumbnail Browsers

  13. Typical METS Creation Workflow • ImageMagick extraction of image metadata • Database input (batch and manual entry) of descriptive and technical metadata • Generation of METS using Perl DBI against MySQL

  14. Image Magick Verbose Dump Image: taqw_001s.jpg Format: JPEG (Joint Photographic Experts Group JFIF format) Geometry: 625x886 Class: DirectClass Type: true color Depth: 8 bits-per-pixel component Colors: 33080 Profile-color: 552 bytes Profile-iptc: 5636 bytes unknown: êëÿ Resolution: 100x100 pixels/inch Filesize: 210kb Interlace: None Background Color: white Border Color: #dfdfdf Matte Color: grey74 Iterations: 0 Compression: JPEG signature: 8c37d0b82374d8eaa6b4d6b062699a9b8d7d86f2ba1d4e320f2226181d062822 Tainted: False

  15. Image Magick non-Verbose Dump • taqw-fr001.tif TIFF 6500x6817 DirectClass 8-bit 126mb 4.3u 0:06 • taqw-fr001s.jpg[1] JPEG 625x886 DirectClass 8-bit 191kb 0.0u 0:01 • taqw-fr001t.jpg[2] JPEG 100x142 DirectClass 8-bit 9954b 0.0u 0:01

  16. Extracting METS from a DB • doWebArchive.cgi MODS for homepage; DC for pages MIX for images/technical textMD for web page/technical

  17. METS for Discovery • Dump METS files into Oracle as CLOB • Create Oracle Intermedia index • XML-aware full-text search • Example: CRL political web archiving project

  18. CRL Political Web Archive • Collaboration between Stanford, Cornell, Texas, NYU, IA under aegis of CRL, Mellon • Sub-Saharan Africa, South East Asia, Latin America, Western Europe • Testbed: 400 URLs; websites from radical groups, NGOs • Internet Archive .arc files

  19. Internet Archive .arc files • .arc file 100 MB aggregate of harvested files, along with HTTP headers and crawler-generated header for each file • Fine as a simple SIP, but basically unmanageable as an AIP or DIP • At present accessed using byte offsets to grab content from aggregate file • Only searchable by URL (Wayback Machine)

  20. Automated extraction of text-based metadata e.g. web pages • arcscraper.pl • Descriptive and technical MD for object • datscraper.pl • Checksums, titles • Links from each object • makeLinkTable.pl • Creates link to object relationships

  21. Go to Videotape

  22. The Future? • Persistent Identifiers • Preservation Metadata Schema • Java development • Move from Oracle to Cheshire II

More Related