1 / 35

GPO’s Federal Digital System Metadata collection, use, and display

GPO’s Federal Digital System Metadata collection, use, and display. December 08, 2010. Scope of FDsys Content. Published Federal information products, regardless of format or medium, which are of public interest or educational value or produced using Federal funds.

delmad
Download Presentation

GPO’s Federal Digital System Metadata collection, use, and display

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPO’s Federal Digital SystemMetadata collection, use, and display December 08, 2010

  2. Scope of FDsys Content • Published Federal information products, regardless of format or medium, which are of public interest or educational value or produced using Federal funds. • Excludes administrative, operational, official use only, not of educational value, classified, constrained by privacy considerations, and self sustaining products.

  3. What is FDsys? • FDsys is a Content Management System • FDsys securely controls digital content throughout its lifecycle to ensure content integrity and authenticity • FDsys is a Preservation Repository • FDsys follows archival system standards to ensure long-term preservation and access of digital content • FDsys is an Advanced Search Engine • FDsys combines extensive metadata creation with modern search technology to ensure the highest quality search experience

  4. Designing a metadata approach • Top down approach: what are our goals and what information do we need to meet them? • Access: Make finding Federal government publications easy. Help users navigate the complex web of versions, issues, and related items. • Authenticity: Maintain content integrity and provenance. • Preservation: Ensure content is usable as information as technology changes. • FDsys collects, stores, uses, and shares metadata to support each goal

  5. FDsys & Access: our goal Government publications present unique challenges to findability • Regulatory and legislative processes make even seemingly simple questions complex • Most users are looking for a specific piece of information much lower than the item level. • Content often repeats in new versions or issues with minor changes – using only full text search  lots of results to wade through

  6. FDsys & Access: collecting Descriptive metadata is collected in three ways. • Closest to the source: user interface at time of submission or any time after • Highest quality: bibliographic metadata created by information professionals via interface with our ILS (manual now, automated later) • Most automated: parsing data directly from the content

  7. Metadata editor

  8. High-Level Information Flow Packages Raw Content Extract Metadata Group into Packages Metadata Content ContentDelivery Search CreateMODS Browse

  9. Parsing Content Runs regular expressions to extract metadata • Regular Expression: (Public Law|Pub. L.|PL|P. L.) (1[0-9][0-9])-([0-9]+) • Content: Pub. L. 109-130 • MODS.xml <congress>109</congress> <number>130</number>

  10. FDsys & Access: storing Descriptive metadata is stored in XML using the Metadata Object Description Schema (MODS) • Element set is richer than Dublin Core • Hierarchy allows for rich description, especially of complex digital objects • <extension> for local elements

  11. MODS record – Federal Register

  12. FDsys & Access: using • Provide simple search with advanced results • Faceted searching: type “candy corn” into the search box, metadata allows you to restrict to an agency or a date range • Provide advanced search features so users can efficiently retrieve specific documents • Metadata allows quick retrieval by citation • Search metadata fields directly to retrieve specific results • “Related item” element in MODS allows us to build automated navigation between objects • Preserves context • article  issue • issue  next issue • Congressional Bill  U.S. Code • Federal Register  CFR • Makes government documents useable for the non-expert

  13. Faceted search using metadata

  14. FDsys & Access: sharing • Provide all descriptive metadata for an item or a granule (e.g., article) in MODS.xml • User-friendly display of commonly used elements

  15. User friendly-display

  16. Internal Data Storage Web Application Data Mapping 110 V FR RULE congnum 2006-02-01 [2006-02-01;] accode=PLAWcongnum=110billnumber=1234publishdate=2008-01-01 110th Congress (2007-2008) Part V Federal Register Rules and Regulations Congress Number February 1st, 2006 After February 1st, 2006 Public and Private Laws. 110th Congress. H.R. 1234. January 1, 2008.

  17. FDsys & Authenticity: our goal GPO and users need methods to verify that the content in the repository • Has not been maliciously or accidentally altered, • Has not been removed or added without authorization, and • Has been approved by, contributed by, or harvested from an official source.

  18. FDsys & Authenticity: collecting • The system takes a checksum of each file uploaded to the repository • A record is generated of the time, date, and user, for each event that occurs to the content

  19. Content Integrity and Authenticity

  20. Events Record

  21. Events Recorded

  22. FDsys & Authenticity: storing • Events and integrity information created according to the PREMIS data dictionary and stored in XML according to the PREMIS schema/data dictionary

  23. FDsys & Authenticity: using • The system periodically regenerates the checksum for each content file in the system and compares to original value • Event logs serve as a provenance record • Tool for investigating unauthorized changes • Track new renditions created to preserve content to the original submitted to GPO

  24. FDsys & Authenticity: sharing • Checksums and event record for publically available renditions are available on the website in XML • Full record for all renditions are stored in XML in the archive

  25. FDsys & Preservation: our goal Digital preservation processes will: • Safeguard digital content and relevant metadata • Reduce reliance on hardware and software to access content • Assess the condition and needs of collections of digital information • Meaningfully render content despite continually changing technology

  26. FDsys & Preservation: collecting • System discerns technical information about how content is represented (e.g., file format) • DROID recognizes file format, links to format registry • Preservation specialists enhance technical metadata as needed, in bulk or for an individual file • System creates structural metadata that describes where files are physically stored and how they related to each other • Abstraction layer from CMS

  27. FDsys Package Metadata (e.g. METS, MODS, PREMIS) and content renditions (e.g. HTML, PDF, XML) for roughly one bound printed document. • One issue of the Federal Register • One issue of the Congressional Record • A single Congressional Bill • A single Congressional Committee Report • One volume of the Code of Federal Regulations • One title of the United States Code • The 9/11 Report

  28. FDsys Package Structure AIP package folder-1 rendition folder-1 content files rendition folder-2 package folder-2 content files aip.xml mods.xml premis.xml

  29. FDsys Content Package

  30. Structural metadata in METS

  31. Technical metadata in PREMIS

  32. FDsys & Preservation: storing • Technical metadata stored in XML according to the PREMIS schema • Structural metadata stored in XML according to the METS schema

  33. FDsys & Preservation: using • Technical metadata used to assess when preservation intervention is needed • On an individual file or a group level • Structural metadata allows us to reconstruct the archive if content management system is destroyed

  34. FDsys & Preservation: sharing • In addition to access at the search result level, content can also be downloaded at the item level as a Dissemination Information Package • Structural, descriptive, preservation, and authenticity metadata with the content • Technical metadata provided on website for all publically available renditions

  35. For more information • Contact us by email Kate Zwaard, kzwaard@gpo.gov • Visit the FDsys website www.gpo.gov/fdsys

More Related