slide1 n.
Skip this Video
Download Presentation
Mike Smorul Saurabh Channan

Loading in 2 Seconds...

play fullscreen
1 / 23

Mike Smorul Saurabh Channan - PowerPoint PPT Presentation

  • Uploaded on

Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park. Mike Smorul Saurabh Channan. Overview. Digital Preservation Research ADAPT Project and Components Pilot Persistent Archive

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Mike Smorul Saurabh Channan' - yanni

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Digital Preservation and Archiving at the Institute for Advanced Computer StudiesUniversity of Maryland, College Park

Mike Smorul

Saurabh Channan

  • Digital Preservation Research
    • ADAPT Project and Components
    • Pilot Persistent Archive
  • Digital Library and Production Data Distribution
    • Global Land Cover Facility
  • Conclusion & Questions
a digital approach to preservation technology adapt
A Digital Approach to Preservation Technology (ADAPT)
  • Premise:
    • Preservation of digital entities into self-describing objects
      • OAIS Information Packet model as a framework
    • Separation of management into three layers, bitstream, semantic, and access/discovery
    • Distributed and Secure Infrastructure
      • Automatic ingestion and replication
      • Policy-Driven Management of Preservation Processes
      • Global Format Registry
      • Separate Peer-to-Peer Deep Archive
adapt components
ADAPT Components
  • Ingestion
    • Producer-Archive Workflow Network (PAWN)
  • Management of Preservation Processes
    • Lightweight Preservation Environment (LPE)
  • Access and Discovery
    • Grid Retrieval and Search Platform (GRASP)
    • EAP Collection browser
overall principles pawn
Overall Principles (PAWN)
  • Distributed, secure ingestion
  • OAIS based Information Packet creation
  • Use of web/grid technologies – platform independent
  • Minimal client-side requirements
  • Ease of integration with archive and data grid systems.
  • Designed to satisfy data integrity requirements of scientific collections and digital preservation
ingestion workflow pawn
Ingestion Workflow (PAWN)
  • Negotiate Submission Agreement.
  • Workflow Initialization and Submission Information Packet (SIP) creation.
  • Transfer of SIPs to Data Grid site.
  • Validation of SIP transfer
  • Organization of data into collections and transfer into Data Grid.
target collections pawn
Target Collections (PAWN)
  • Digital Image Collection
    • Rich metadata in various formats
  • Web site crawling
    • Online and interactive content
  • GLCF Landsat data
    • Spatial and temporal metadata
    • Large quantity (over 15,000 objects)
lightweight preservation environment lpe
Lightweight Preservation Environment (LPE)
  • The Lightweight Preservation Environment is an archival system based on a modular design using grid and web services.
  • The current implementation relies mostly on Globus technologies.
  • Primarily, we’ve focused on wrapping logic around those components.
developed components lpe
Developed Components (LPE)
  • Data Manager (DM):Organizes data and queries between the user and the other components
  • Policy Manager (PM):Ensures that a minimum number of copies exist for any given file
  • Transformation Manager (TM):Executes specific transformations on a named file on a given storage node and returns the results
grid retrieval and search platform grasp
Grid Retrieval and Search Platform (GRASP)
  • Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the UMIACS GLCF.
  • Provides a graphical interface into data grid holdings.
  • Access to entire GLCF holdings through the Storage Resource Broker(SRB)
grasp architecture1
GRASP Architecture
  • GRASP uses a data grid as an abstract storage repository.
  • Metadata in the grid is mined from the grid itself or from external sources and published into a browsable form.
    • Data grids may allow for platform independent metadata, but may not be optimal for access
global land cover facility
Global Land Cover Facility
  • Mission:

“The GLCF Mission is to encourage the use of remotely sensed imagery, derived products and applications within a broad range of science communities in a manner that improves comprehension of the nature and causes of land cover change and its impact on the Earth.”

  • Goal:

“The GLCF Goal is to provide free access to an integrated collection of critical land cover and Earth science data through systems that are designed to maximize user outreach and that promote development of novel tools for ordering, visualizing and manipulating spatial data.”

data collections
Data Collections

Majority of the holdings are of Landsat and MODIS data

data distribution
Data Distribution
  • Data at the GLCF
    • Approximately 5.1 TB compressed
    • Approximately 13 TB uncompressed
  • Anticipated Production Rate
    • Triple or Quadruple current data holding within the next two year
data discovery applications
Data Discovery Applications
  • ESDI
  • Web Interface
  • User friendly
    • Search
    • Retrieve
    • Discover
  • Scalable
  • Over 9TB a month !
glcf archive
GLCF Archive

Scalable and Reliable

participation possibilities
Participation Possibilities
  • PAWN ingestion component
    • Minimal geospatial metadata support planned, can be expanded to support NGDA endpoint
  • GRASP display component
    • Solid core components, end-user interfaces need additional polishing
  • GLCF data holdings
    • Additional hardware required if additional data and access mechanisms (grid, etc) required
  • Other possibilities include: grid infrastructure, GSI security, format registry, etc.