a platform for auditable distributed asymmetric archival replication
Download
Skip this Video
Download Presentation
A Platform for Auditable, Distributed, Asymmetric Archival Replication

Loading in 2 Seconds...

play fullscreen
1 / 18

A Platform for Auditable, Distributed, Asymmetric Archival Replication - PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on

A Platform for Auditable, Distributed, Asymmetric Archival Replication. Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Platform for Auditable, Distributed, Asymmetric Archival Replication' - skule


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a platform for auditable distributed asymmetric archival replication

A Platform for Auditable, Distributed, Asymmetric Archival Replication

Micah AltmanAssociate Director, Harvard-MIT Data CenterInstitute for Quantitative Social Science, Harvard University

Bryan BeecherDirector of Computing and Network ServicesInter-university Consortium of Political and Social Research, University of Michigan

Marc MaynardDirector of Technical ServicesThe Roper Center for Public Opinion Research, University of Connecticut

Jonathan CrabtreeAssistant Director for Archives and Information TechnologyHW Odum Institute for Research in Social Science, University of North Carolina

CNI 2008 Fall Task Force Meeting

our story
Our Story
  • Who are you guys?
  • What problem are you trying to solve?
  • What have you done?
  • Why do we care?

CNI 2008 Fall Task Force Meeting

data pass
Data-PASS
  • Partnership devoted to identifying, acquiring and preserving data at-risk of being lost to the social science research community
  • Partners
    • ICPSR
    • Odum Institute
    • Harvard MIT Data Center
    • Roper Center
    • National Archives

http://flickr.com/photos/phauly/35555985/

CNI 2008 Fall Task Force Meeting

data pass1
Data-PASS

CNI 2008 Fall Task Force Meeting

data pass2
Data-PASS
  • Lots of little files (social science data)
    • ASCII data files
    • PDF technical documentation (codebooks)
    • Millions of ‘em
  • Archival storage
    • Was tape
    • Now disk

CNI 2008 Fall Task Force Meeting

before
Before

CNI 2008 Fall Task Force Meeting

after
After

CNI 2008 Fall Task Force Meeting

archival storage
Archival storage?

http://failblog.org/2008/02/08/floppy-fail/

CNI 2008 Fall Task Force Meeting

archival storage1
Archival storage?
  • Remote disks
  • Grids
  • Clouds
  • With partners?

CNI 2008 Fall Task Force Meeting

why roll your own
Why roll your own?
  • Policy-driven
  • Auditable
  • Asymmetric
  • Independence of each location

CNI 2008 Fall Task Force Meeting

syndicated storage platform ssp
Syndicated Storage Platform (SSP)
  • Start with LOCKSS
  • Lots of Copies Keep Stuff Safe
  • But used in a closed network
    • Private LOCKSS Network (PLN)
    • A few of them out there
      • MetaArchive perhaps the best known
  • Biggest selling point was independence of each node in the PLN

CNI 2008 Fall Task Force Meeting

slide12
PLNs
  • LOCKSS is really easy to setup
    • PLNs are more difficult
  • Other differences between traditional PLN and our needs
    • Our content isn’t harvestable via HTTP
    • Our PLN nodes are different sizes
    • Our trust model requirement prevents a centralized authority controlling the network

CNI 2008 Fall Task Force Meeting

ssp stone soup platform
SSP = Stone Soup Platform?
  • ICPSR and Odum setup a small PLN
  • HDMC provided a harvester and designed the schema
  • Odum built the Comparator
  • Roper is building the Invitor

CNI 2008 Fall Task Force Meeting

slide14
PLN

CNI 2008 Fall Task Force Meeting

schema
Schema
  • Nodes
    • IP address
    • Storage commitment
  • AUs
    • Max size
    • # in the PLN
  • Lots more

CNI 2008 Fall Task Force Meeting

comparator
Comparator
  • diff for our SSP
  • Compares
    • Contents of the LOCKSS Cache Manager [sic]
    • Schema
  • Produces
    • List of differences between “what is” and “what should be”
    • Feeds into another tool for “fixing the PLN”
  • Machine-actionable output (XML)

CNI 2008 Fall Task Force Meeting

invitor
Invitor
  • Reads the report from the Comparator
  • Issues requests to PLN nodes to ADD or DROP an AU
    • Expectation is that PLN nodes always accept an ADD if they can
      • An offer they cannot refuse
  • Requests may be reviewed/approved by a human administrator (or not)
  • USENET news technology?

CNI 2008 Fall Task Force Meeting

summary
Summary
  • Data-PASS is a group ofarchives committed to preserving social science data
  • Exploring various technology options
  • One avenue is a custom LOCKSS deployment
    • Network schema
    • OAI data harvester
    • Comparison tool
    • Network update tool

CNI 2008 Fall Task Force Meeting

ad