1 / 21

Overview of LOCKSS

Overview of LOCKSS. Session Learning Objectives . Provide an overview of the LOCKSS architecture. Describe the LOCKSS polling process Describe how LOCKSS private networks differ. Provide a vocabulary of technical terms used frequently with LOCKSS networks. Architectural Components.

jontae
Download Presentation

Overview of LOCKSS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of LOCKSS

  2. Session Learning Objectives • Provide an overview of the LOCKSS architecture. • Describe the LOCKSS polling process • Describe how LOCKSS private networks differ. • Provide a vocabulary of technical terms used frequently with LOCKSS networks

  3. Architectural Components • Provider Sites (digital collections) • LOCKSS nodes (aka “peers”) • Plugins / Plugin Repository • Cache Manager • Title Database / Conspectus Database

  4. Provider Sites • Prepare a digital collection so that it is web accessible to the preservation nodes • Expose a “manifest” web page for each collection, according to LOCKSS specifications. • Grants permission for LOCKSS to crawl • Gives starting point for crawl • Provide information sufficient to create a LOCKSS plugin for the collection (or else create the plugin themselves and reposit that plugin with the LOCKSS network)

  5. LOCKSS Peer Nodes • Data caches for harvested content • Caches organized into archival units (AUs) • Nodes can select which AUs to crawl and preserve • There must be >= 6 copies of an AU in order for the polling process to work properly

  6. Plugins / Plugin Repository • Tell LOCKSS where, how and how often to crawl a provider site for AUs • Plugins are Java based • Distinct from core LOCKSS software

  7. Cache Manager • Distributed separately from LOCKSS • Can remotely inspect and manage the caches on the various peer nodes

  8. Title / Conspectus Databases • Title database on each node describes and manages which AUs to preserve on that node • Conspectus Database designed for MetaArchive Project, provides more extensive metadata about the preserved digital collections, and feeds the Title database with entries

  9. Plugin Repository DC1 Digital Collection 1 Private LOCKSS Network Nodes 1 DC1 AU 1 DC2 DC2 2 DC2 Web Site 3 Manifest page DC1 AU 2 4 DC1 DC2 5 DC2 Digital Collection 2 AU 1 AU 2 6 Web Site DC1 Source Code 7 DC1 DC2 DC1 8 AU 3 DC2 Manifest page SQL Dump 9 DC2

  10. The Polling Process

  11. Invited nodes create fresh SHA1 digest of the AU Polling Process resulting in “landslide loss”, AU repair Poll Effort Proof is cryptographically derived and sent to affirmative voter’s challenges Affirmative PollChallenge message responses allow that inner circle node to participate in poll DC2-AU1 DC2-AU1 2 4 SHA1 SHA1 There is a “landslide” of valid, disagreeing votes against the Node 5’s SHA1 digest of DC2-AU1 Invitation Valid vote disagrees Valid vote disagrees Node 5 calls poll on AU 1 of Digital Collection 2 PollChallenge PollProof 1 Once repair is completed, Node 5 immediately calls a new poll, which effectively verifies, or invalidates and corrects, the repair DC2-AU1 Valid vote disagrees 5 SHA1 Encrypted RepairRequest message Repair made DC2-AU1 SHA1 Valid vote agrees Node 9 nominates 7 and 8 Node 5 discovers new peers through nomination process Node 5 invites some recently encountered peers to vote. (Each node maintains a reference list of the recently encountered peers) Those invited are the “inner circle” for this opinion poll. DC2-AU1 9 Since agreeing votes are below threshold, Node 5 picks a random disagreeing voter from the inner circle SHA1 DC2-AU1 8 DC2-AU1 7 Nominated Nodes 7 and 8 belong to the “outer circle”, can be invited to subsequent voting rounds by Node 5

  12. Polling Refresh Timer • A peer sets a refresh timer for a given AU to determine the interval between successive polls • System parameter R is the mean for the possible random values generated for the refresh timer

  13. System Parameter – ‘Quorum’ • Q = # of valid inner circle votes required to conclude a poll successfully • Q = 6 is the thoroughly tested value in use • If votes < Q, poller invites additional peers, or else aborts the opinion poll

  14. Polling Outcome – ‘Landslide Win’ • The poller considers its current copy to have integrity • This is the only scenario in which an opinion poll concludes successfully • The poller updates its reference list and then waits until the next polling period (determined by the refresh timer)

  15. Reference List Update • Happens only after a successful poll • Poller removes the inner circle peers who had valid votes in the last opinion poll • Culls peers it has not been able to contact for some time • Adds outer circle peers whose votes were valid and eventually agreeing

  16. Polling Outcome - Inconclusive • D = max allowed “minority” votes • If Agreeing Votes > D, and • Agreeing Votes < Total valid votes – D, • Then the poll is inconclusive, raises alarm • Human intervention needed to determine if nodes have been compromised • Peers voting in agreement with a known bad copy are blacklisted if that peer node can’t be identified or it won’t cooperate

  17. Further Details on Polling Process • Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, and Yanto Muliadi, "LOCKSS: A Peer-to-Peer Digital Preservation System", ACM Transactions on Computer Systems (TOCS). http://www.eecs.harvard.edu/~mema/publications/TOCS2005.pdf • See also LOCKSS related publications at http://www.lockss.org/lockss/Publications

  18. The LOCKSS Private Network Difference • More flexible (not appliance based) • Can run on any operating system that supports Java • LOCKSS Team maintains rpm packages for Linux installations • Peer Node administrators have greater discretion configuring access, customizing functionality, e.g. altering system parameters

  19. The LOCKSS Private Network Difference (cont.) • Can extend LOCKSS core functionality with supplemental tools and methods to fit new use cases • E.g. the MetaArchive Conspectus database

  20. Vocabulary • (Please refer to the workshop binder for terminology and definitions)

  21. Overview of LCAP version 3

More Related