1 / 20

Incompatible or Interoperable?

Incompatible or Interoperable? . A METS bridge for a small gap between two digital preservation software packages. Lucas Mak Metadata & CatalogLibrarian makw@ msu.edu. Aaron Collie Digital Curation Librarian collie@msu.edu. What we wanted. What we found. What we did. METS.

bin
Download Presentation

Incompatible or Interoperable?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian makw@msu.edu Aaron Collie Digital Curation Librarian collie@msu.edu

  2. What we wanted What we found What we did METS

  3. We bridged a gap, but we didn’t close the bridge METS

  4. METS Fedora Commons Input: • METS Fedora Extension • fedora-batch-ingest.sh • Datastreams! Archivematica Output: • METS.xml • AIP • DIP METS 1.8 DIP DIP METS AIP Serving 5 TB Staging 12 TB Dark Archive 84 TB AIP

  5. METS Fedora Ext. 1.1 METS ProQuest Humans XSL METS 1.8 DIP AIP Staging 12 TB

  6. METS Fedora Ext. 1.1 METS Persistent ID (PID)

  7. Why? • We wanted to be able to control and systematize ingest at the microservice level • And we like the direction Archivematica is taking • We wanted to pipe technical and preservation metadata into Fedora Commons • This was the reason we got started • We haven’t contributed to the open source community, and we wanted something to learn on. • We are thinking of it as professional development…

  8. Comparing Archivematica & Fedora METS • Different schema • Archivematica: METS v. 1.8 • http://www.loc.gov/standards/mets/version18/mets.xsd • Fedora: Fedora METS 1.1 • http://fedora-commons.org/definitions/1/0/mets-fedora-ext1-1.xsd • Differences in structure, elements, attributes, & values allowed

  9. <structMap> • Archivematica • Physical structMap of the bag (i.e. directory structure) • Fedora: No <structMap> per v.1.1* • Solution: • Structure represented by <GROUPID> & <SEQ> attributes of <mets:file> • <SEQ> by page no. embedded in filename • only physical arrangement is possible unless changing file naming convention to include logical info • <GROUPID> by file type/usage (e.g. preservation master, high/low resolution access copies) * <mets:structMap> is allowed in schema v.1.0 (used until Fedora 3.0)

  10. <fileSec> • Archivematica • Two file groups: “Original” & “Submission documentation” • Original: digital objects • Submission documentation: descriptive metadata XML files • Fedora • Datastreams to be ingested as files • Files of digital objects and others (e.g. Archivematica METS) • Descriptive metadata XML files are ingested as “inline XML datastreams” • Copy all XML files in “Submission documentation” into separate <dmdSecFedora> elements

  11. <amdSec> • Archivematica: Hierarchical structure <amdSec ID=“amdSec1”> <techMD ID=“techMD1”/> … <digiProvMD ID=“digiProvMD1”/> </amdSec> <amdSec ID=“amdSec2”> <techMD ID=“techMD2”/> … <digiProvMD ID=“digiProvMD2”/> </amdSec> • 1 digital file has 1 <amdSec> • All <techMD>, <rightsMD>, <sourceMD> and <digiProvMD> pertaining to the same file are nested under the same <amdSec>

  12. Fedora: Flat structure <amdSec ID=“tech1”> <techMD ID=“tech1.0”/> </amdSec> <amdSec ID=“digiProv1”> <digiProvMD ID=“digiProv1.0”/> </amdSec> <amdSec ID=“tech2”> <techMD ID=“tech2.0”/> </amdSec> • To accommodate inline XML datastream versioning • ID (syntax DSn.v) contains both: • the number of the inline datastream (n) and • the version number of the datastream (v) • Individual <amdSec> serves as container and its ID serves to indicate datastream number • <techMD> and alike have their IDs to indicate datastream version number

  13. <AMDID> attribute in <mets:file> • Archivematica • Pointing to one <amdSec>, which has <techMD>, <rightsMD>, <sourceMD>, and <digiProvMD> nested within, per file • <mets:fileID=“file1” AMDID=“amdSec1”/> • Fedora • Pointing to multiple <amdSec>, each of which contains <techMD>, <rightsMD>, <sourceMD>, or <digiProvMD>, per file • <mets:fileID=“file1” AMDID= “tech1 rights1 source1 digiProv1”/>

  14. <dmdSec> • Archivematica • Only 1 Dublin Core record is allowed to describe the SIP • Constrained by Archivematicaworkflow instead of METS schema • Additional descriptive metadata XML records are included in “Submission documentation” folder • Fedora • Fedora extension element: <dmdSecFedora> • Allowed MDTYPE: MARC, EAD, DC, NISOIMG, LC-AV, VRA, TEI Header, DDI, FGDC, & OTHER • Copy XML files in “Submission documentation” folder into separate <dmdSecFedora> • MODS has to be labeled as “OTHER” • Use namespace URI to assign correct “MDTYPE” • Does not work with TEI Header or EAD

  15. <mets:metsHdr> • Archivematica • Does not use (optional in METS schema) • Fedora • <RECORDSTATUS> attribute to indicate whether the object is “active”, “inactive” or “deleted” • Hard-coding in with constant data <mets:metsHdr RECORDSTATUS="A"> <mets:agent ROLE="IPOWNER" TYPE="ORGANIZATION"> <mets:name>MSU Libraries Digital and Multimedia Center</mets:name> </mets:agent> </mets:metsHdr>

  16. <OWNERID> attribute in <mets:file> • Archivematica • Does not use (optional in METS schema) • Fedora • To indicate whether the file is “managed by Fedora internally”, “externally referenced”, or “redirected” • Though optional according to Fedora-METS schema • Determine based on filename or file format • Archivematica add “checksum” into filename for files generated during the preservation workflow

  17. Proposed Workflow Web Display Staging Area 12 TB METS AIP DIP Dark Archive(s) Serving Share(s) 84TB

  18. METS What a bridge gets us: • Automatically extracts and captures technical & preservation metadata • Eases handling of complex objects with lots of metadata or parts • Maintains and manages separate AIP/DIP packages

  19. What a full integration might benefit from: • Archivematica A/DIP Content Model & Solution Pack • Integrated AIP management • Including dashboard GUI • Including JMS messaging • Integrated rebuilds from filesystem • Currently supported in Fedora Commmons • On Roadmap for archivematica • Automated ingest, improved handling

  20. Questions? • Lucas Mak (makw@msu.edu) • Aaron Collie (collie@msu.edu)

More Related