210 likes | 326 Views
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department. LOCKSS: Lots of Copies Keeps Stuff Safe . CS 739 Distributed Systems. Andrea C. Arpaci-Dusseau. Preserving Peer Replicas By Rate-Limited Sampled Voting, Maniatis, Roussopoulos, Giuli, Rosenthal, Baker, Muliadi (Stanford) -- SOSP’03.
E N D
UNIVERSITY of WISCONSIN-MADISONComputer Sciences Department LOCKSS: Lots of Copies Keeps Stuff Safe CS 739Distributed Systems Andrea C. Arpaci-Dusseau Preserving Peer Replicas By Rate-Limited Sampled Voting, Maniatis, Roussopoulos, Giuli, Rosenthal, Baker, Muliadi (Stanford) -- SOSP’03
Motivation • Librarians: Responsibility to preserve important materials • Traditional approach: • Acquire lots of copies • Distribute around world • Lend or copy to provide access • Academic publishing is moving to Web • LOCKSS: Real system used by many libraries (1999) • How to apply techniques to digital preservation? • Strength: Real problem that people care about, real solution being used
Design Goals and Assumptions • Must be cheap to build and maintain • No RAID systems • Need not operate quickly • Want to prevent change, not expedite it • Must function properly for decades • No centralized control • Handle failures • Handle malicious attackers • Handle catastrophic random failures • How is this different from other P2P systems?
Design Principles • Cheap storage is unreliable • No long-term secrets • Can’t hold private keys for arbitrary time periods • Use inertia • Rate limit the amount of activity and change • Avoid third-party reputation • Malicious users can lie about good users • Attackers can “cash in” history of good behavior • Reduce predictability • Make difficult for attackers to predict behavior of victims • Make intrusion detection intrinsic • Part of the system itself • Assume strong adversary • May want to change, suppress, or steal content
LOCKSS Overview • Libraries run persistent web caches • Collect by crawling journal web-sites • Distribute by acting as limited proxy cache • Preserve by cooperating with others to detect and repair damage • Peers vote on large archival units (AU’s) • AU == year’s run of a journal • Each peer holds different AU’s • If AU damaged, call increasingly specific partial polls
Opinion Poll Protocol • Terminology: • Loyal, malign, healthy, damaged peers • Goal: • High probability loyal peers are healthy(despite attacks by malign peers and failures) • Low probability even powerful adversary can damage significant proportion of loyal peers without detection • Overview • Poll initiator calls opinion poll on AU >> rate of random damage • Invites small subset of known peers (poll participant or voter) • Voter computes and returns digest of AU • Vote results for poll initiator: • Landslide win: Votes overwhelmingly agree with own version • Landslide loss: Repair AU by fetching copy of AU from peer • Inconclusive poll: Raise alarm for human attention • Who can benefit from the poll? What if voter disagrees?
Peer Lists per AU • Lists for every AU • Friends list: Peers have outside relationships with friends • Reference list: Peers encountered recently • Bootstrap: Init with friends list • Inner circle: Those invited to influence poll results • Outer circle: Nominated by inner circle
Poll Initiation • Poll initiation: (about every 3 months per AU) • Choose N random peers from ref list: Inner circle • Send Poll [Poll ID, Diffie-Hellman Public Key] • Wait for responses.. • Voter from inner circle: Decide if want to participate • Why might a peer not participate? • Pick new DH public key, compute symmetric session key • How does Diffie-Hellman work? • A chooses secret a, sends g^a mod p • B chooses secret b, sends g^b mod p • Each computes secret (g^b mod p)^a mod p = (g^a mod p)^b mod p • Why encrypt messages?? • Send back encrypted YES or NO to participate • Send PollChallenge [Poll ID, DH public key, {challenge, YES}]
Poll Effort • Initiator: Produce computational effort for voter • Why proof of computation by initiator needed? • Use memory-bound functions (MBF) with poll id and challenge as input • Why are MBF good? • Send back PollProof [Poll Id, poll effort proof] • Even send this to voters who responded NO.Why? • Voter: Verifies result • Less computation needed to verify result than compute • Nominate outer circle peers (more later) • Randomly selected from reference list • Send Vote messages for AU • Also send proof of computational effort in rounds • Why proof of computation by voter needed? Why in rounds?
Vote Tabulation • Initiatator: Tabulates valid votes from inner circle • Three cases: • Landslide loss: Agreeing votes <= D • Repair AU • Landslide win: Agreeing votes > V-D • Opinion poll concludes successfully; reschedule poll • Inconclusive: Raise alarm • Repair • Initiator picks disagreeing voter and requests repair • When is voter willing to supply content? • Retabulate results with new content
Outer Circle • What is the purpose of the outer circle? • Initiator: Picks same number from every nominator • Repeat same steps of protocol with outer circle • Why? • Differences? • Update reference list • What is a malign peer trying to do? • Who is removed? • Insert: Valid/agreeing outer circle peers and random friends • Why?
Adversary Attacks • Assume powerful adversary • Total information awareness • Perfect work balancing • Perfect digital preservation • Local eavesdropping • Local spoofing • Stealth • Unconstrained identities • Exploitation of Common peer vulnerabilities • Complete parameter knowledge
Adversary Attacks • Stealth modification • Convince loyal peer has damaged AU • Replace protected content with bad version • Focus of paper • Nuisance • Raise alarms • Attrition • Make loyal peers waste computational resources so can’t repair damage • Theft • Acquire published content from peers without fee • How does LOCKSS prevent? • Free-loading • Obtain services without supplying to others
Stealth Modification Attack • Lurk phase • Increase foothold: malign peers in reference list (inner circle) • Wait until invited into circle • Act loyal • Nominate more malign peers • 2) Attack phase • When see poll is vulnerable (I.e., overwhelming majority of inner circle is malign), vote bad • Why is attacking successfully hard? • Rate limiting: Must wait for vulnerable polls to occur • Damaged loyal peers call and vote in polls using bad copy • Can be repaired or raise alarms (doesn’t act differently when don’t have majority) • Must expend effort calling polls too • Loyal peer only requests repair if voted in malign peer’s polls
Simulations • Environment • 1000 peers • Clusters of 30 peers; 80% for friends, 20% random • Call polls every 3 months on average • N (size of inner circle): 20, Q: 10 • How many false alarms with no adversaries? • 20 years, random damage at every peer: 5-10 years
Simulation: Lurking Time • How long must lurk for desired foothold ratio? • 10% malign; how many years for 40% ratio? 50%? • 30% malign; how many years for 50% ratio? 70%?
Simulation: Alarm Time • How long before attack detected (I.e., inconclusive poll alarm raised)?
Simulation: Damage to AU • How many bad replicas? How many years? • When is irrecoverable damage caused?
Simulation: Worst-case • How long should adversary lurk before attack?
Simulation: Benefit of Churn • What churn rates are best?
Conclusions • Interesting motivation • Real problem and deployed solution • Opinion Poll Protocol has many attractive properties • Uses problem domain to guide protocol • Inertia: Adversaries can’t influence poll timing • Friend list: Use outside relationships to influence trust • Attacking is very costly • Must lurk long period to increase foothold in inner circle • Must continually pay through proofs of computation (MBF) • Immediately removed from lists if disagree • Easy to set off alarms • If voting results are inconclusive, human notified