national geospatial digital archive
Download
Skip this Video
Download Presentation
National Geospatial Digital Archive

Loading in 2 Seconds...

play fullscreen
1 / 22

National Geospatial Digital Archive - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

National Geospatial Digital Archive. Greg Janée University of California at Santa Barbara. A misadventure in preservation. 1976 Viking probes go to Mars soil data is analyzed for evidence of life 1999 USC neurobiologist Joseph Miller asks for data NASA has data on tape! But...

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'National Geospatial Digital Archive' - eshe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
national geospatial digital archive

National Geospatial Digital Archive

Greg Janée

University of California at Santa Barbara

a misadventure in preservation
A misadventure in preservation
  • 1976
    • Viking probes go to Mars
    • soil data is analyzed for evidence of life
  • 1999
    • USC neurobiologist Joseph Miller asks for data
    • NASA has data on tape!
  • But...
    • tapes coded “in a format so old that the programmers who knew it had died”

Greg Janée • DCC seminar • 2005-09-27

paradox of preservation
Paradox of preservation
  • Is the data valuable?
    • yes: had to travel to another planet to get it
  • Is the data being used?
    • no
    • perhaps never again
  • How much am I willing to pay for its preservation?
    • as close to zero as possible

Greg Janée • DCC seminar • 2005-09-27

is it worth preserving
Is it worth preserving?
  • Keith’s equation*:
    • (current value) = (intrinsic value) - (cost to use)
  • Greg’s equation:
    • item is worth preserving for time duration T if:
      • (intrinsic value) * ProbT(usage) >T(preservation costs) + (cost to use)

*apologies to Keith Johnson, Stanford libraries

Greg Janée • DCC seminar • 2005-09-27

project genesis
Project genesis
  • NDIIPP
    • Library of Congress, 2000
    • $100M
    • http://www.digitalpreservation.gov/
  • NGDA
    • UCSB (MIL) & Stanford (Branner Library)
    • $2.6M, 3 years
    • geospatial data
    • http://www.ngda.org/

Greg Janée • DCC seminar • 2005-09-27

project goal
Project goal
  • “How can we preserve geospatial data on a national scale and make it available to future generations?”
  • No focus on a particular collection
  • Geospatial data
    • discrete chunks
    • relatively highly-structured, well-defined
    • but 90% of our work is generic

Greg Janée • DCC seminar • 2005-09-27

idea 1
Idea #1
  • Archival has to be cheap & easy
    • must be distributed
    • little incentive, no funding
    • not sexy

Greg Janée • DCC seminar • 2005-09-27

ngda approach
NGDA approach
  • Compromise: define cheap archive
    • fundamental approach: preservation by co-archival of object semantics
    • ingest: one step up from crawling
    • web access
    • notable for what’s missing: discovery, usability
  • Foundation for additional functionality
    • e.g., migration
    • prototype archives will offer ADL, OAI access

Greg Janée • DCC seminar • 2005-09-27

idea 2
Idea #2
  • Archival systems must be designed with their own demise in mind
    • archival objects will long outlive any system that manages them
    • system-level migrations will occur
    • at inopportune times

Greg Janée • DCC seminar • 2005-09-27

typical repository architecture
system

handle

resolver

handle

resolver

storage

database

database

database

database

fragile

Typical repository architecture

Greg Janée • DCC seminar • 2005-09-27

ngda architecture
access

ingest

Web

ADL

OAI

bulk

loader

archival system

storage subsystem

standard, public data model

databases,

caches,

etc.

NGDA architecture

Greg Janée • DCC seminar • 2005-09-27

post ngda architecture
Post-NGDA architecture

Web

storage subsystem

standard, public data model

Greg Janée • DCC seminar • 2005-09-27

storage system requirements
Storage system requirements
  • Req’s:
    • associate UUIDs/RIDs with bitstreams
    • retrieve global/local bitstream by UUID/RID
    • determine (parent) UUID of any bitstream
    • list all UUIDs
  • Satisfied by:
    • any filesystem
    • any kind of UUIDs
      • tag:library.ucsb.edu,2005:identifier

Greg Janée • DCC seminar • 2005-09-27

archival objects
manifest

UUID

RID

component

Archival objects

UUID

Greg Janée • DCC seminar • 2005-09-27

archival object representation
Archival object representation
  • Components are files
  • Manifest is an XML document
  • Other approaches
    • OAIS: archival information packages (AIPs)
    • XMLtape

Greg Janée • DCC seminar • 2005-09-27

ingest
Ingest
  • Ingest template defines
    • common structure of objects to be ingested
    • necessary validations
    • associations to other objects
      • assumes pre-loading of semantic definitions
    • policies, rights, etc.
  • Represents choke point
    • requires human evaluation

Greg Janée • DCC seminar • 2005-09-27

format registry
Format registry
  • We’re developing one
    • who isn’t?
  • Serves as archive of format specifications
  • How broadly to interpret “format”?
    • traditional file format
    • product
    • series, collection, arbitrary set

Greg Janée • DCC seminar • 2005-09-27

format dependencies
“dessicated”

version

Format dependencies
  • Consider dependency graph induced by format specifications
  • Def: a format is recoverable if the format of its specification is recoverable
  • Axioms: plain text, HTML are recoverable

GIF

PDF

GeoTIFF

CSS

TIFF

plain

text

HTML

Greg Janée • DCC seminar • 2005-09-27

challenges
Challenges
  • Making ingest easy, easier, easier-er, ...
  • GIS formats
    • very complex: topology, layer, coverage, project
    • proprietary
  • MODIS
    • multiple petabytes
    • format (HDF) is not well-defined
    • moving to on-demand computation of products
    • lineage important
    • copious additional semantics

Greg Janée • DCC seminar • 2005-09-27

misadventure redux
Misadventure, redux
  • What if there had been an NGDA-like solution?
    • format specification would have been archived
  • Limitations
    • data not necessarily immediately usable
    • format specification itself not necessarily viewable
  • But limitations can be addressed according to usage, available resources

Greg Janée • DCC seminar • 2005-09-27

questions for you
Questions for you
  • Archival systems
    • definition? functionality?
  • Storage systems
    • definition? functionality?
  • Archival object representation
    • discrete files vs. AIPs?
  • GIS formats
    • “dessicated” form?

Greg Janée • DCC seminar • 2005-09-27

ad