Systems research
Download
1 / 39

Systems Research - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Systems Research. Barbara Liskov October 2007. Replication. Goal: provide reliability and availability by storing information at several nodes. Single Server. Server. Clients. Single Server. X. Server. Clients. Replicated Servers. X. Servers. Clients. Replication Issues. Semantics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Systems Research' - brendy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Systems research

Systems Research

Barbara Liskov

October 2007


Replication
Replication

  • Goal: provide reliability and availability by storing information at several nodes


Single server
Single Server

Server

Clients


Single server1
Single Server

X

Server

Clients


Replicated servers
Replicated Servers

X

Servers

Clients


Replication issues
Replication Issues

  • Semantics

  • What is being replicated

  • Failure assumptions


Issue 1 semantics
Issue 1: Semantics

One-copy consistency

Or weaker

Servers

Clients


Issue 2 type of operations
Issue 2: Type of Operations

Only reads and writes

General operations

acct.deposit($$);

acct.withdraw($$$);


Replication protocols
Replication protocols

  • Data replication

    • Quorums and voting

  • Operations

    • State machine replication

    • System performs a sequence of operations


Issue 3 failure assumptions
Issue 3: Failure Assumptions

  • Network is asynchronous

    • Eventual delivery

  • Network is malicious

    • Corruption

    • Replay

    • Spoofing

    • Handled via cryptography

  • Nodes are failstop or Byzantine


Failstop failures
Failstop Failures

  • Nodes fail by crashing

    • A machine is either working correctly or it is doing nothing!

  • The assumption made in the 1980s


Failstop failures1
Failstop failures

  • Requires 2f+1 replicas

    • Operations must intersect at at least one replica

    • In general want availability for both reads and writes: f+1 nodes is sufficient

    • Read and write quorums


Quorums
Quorums

State:

State:

State:

Servers

X

write A

write A

write A

Clients


Quorums1
Quorums

State:

State:

State:

A

A

X

Servers

Clients


Quorums2
Quorums

State:

State:

State:

A

A

X

Servers

X

write B

write B

write B

Clients


Data replication
Data Replication

  • R.H. Thomas, A majority consensus approach to concurrency control for multiple copy databases, ACM TODS, 1979

  • D.K. Gifford, Weighted voting for replicated data, SOSP 1979

  • H. Attiya, A. Bar-Noy, and D. Dolev, Sharing memory robustly in message-passing systems, JACM , Jan. 1995


Quorum consensus
Quorum Consensus

  • Each data item has a version number

    • A sequence of values

  • write(d, val, v#)

    • Waits for f+1 oks

  • read(d) returns (val, v#)

    • Waits for f+1 matching v#’s

    • Else does a write-back


State machine replication
State Machine Replication

Replicas must execute operations in the same order

Implies replicas will have the same state, assuming

replicas start in the same state

operations are deterministic


Failstop replication
Failstop Replication

Viewstamped replication: a new primary copy method to support highly available distributed systems, B. Oki and B. Liskov, PODC 1988

Thesis, May 1988

Replication in the Harp file system, S. Ghemawat et. al, SOSP 1991

The part-time parliament, L. Lamport, TOCS 1998

Paxos made simple, L. Lamport, Nov. 2001


Approach
Approach

Use a primary

It orders the operations

Other replicas obey this order


Views
Views

System moves through a sequence of views

Primary runs the protocol

Replicas watch the primary and do a view change if it fails


Normal case
Normal Case

Client sends request to primary

Primary sends prepare message


Normal case1
Normal Case

Client sends request to primary

Primary sends prepare message

Replicas receive prepare

Send prepare-ok message to the primary


Normal case2
Normal Case

Client sends request to primary

Primary sends prepare message to all

Replicas receive prepare

Send prepare-ok message to the primary

Primary waits for f prepare-oks

Sends response to client


Normal case3
Normal Case

  • A 2-phase protocol:

    • Prepare; commit

  • Only 3 message delays


Byzantine failures
Byzantine Failures

  • Nodes fail arbitrarily

    • They lie, they collude

  • Causes

    • Malicious attacks

    • Non-deterministic software errors


Quorums3
Quorums

3f+1 replicas are needed to survive f failures

2f+1 replicas is a quorum

Insures intersection

The minimum in an asynchronous network


Quorums4
Quorums

State:

State:

State:

State:

A

A

A

Servers

X

write A

write A

write A

write A

Clients


Quorums5
Quorums

State:

State:

State:

State:

A

A

B

B

B

Servers

X

write B

write B

write B

write B

Clients


BFT

  • M. Castro and B. Liskov, Practical Byzantine faulty tolerance and proactive recovery, ACM TOCS, 2002


Strategy
Strategy

Primary runs the protocol in the normal case

Replicas watch the primary and do a view change if it fails

Key difference: replicas might lie

Solution: add a pre-prepare phase


Normal case4
Normal Case

Client sends request to primary


Normal case5
Normal Case

Client sends request to primary

Primary sends pre-prepare message to all


Normal case6
Normal Case

Client sends request to primary

Primary sends pre-prepare message to all

Why not a prepare message?

Because primary might be malicious


Normal case7
Normal Case

Client sends request to primary

Primary sends pre-prepare message to all

Replicas check the pre-prepare and if it is ok:

Send prepare messages to all


Normal case8
Normal Case

Replicas wait for 2f+1 matching prepares

Send commit message to all


Normal case9
Normal Case

Replicas wait for 2f+1 matching prepares

Send commit message to all

Replicas wait for 2f+1 matching commits

Execute operation and send result to client


Follow on work
Follow-on Work

  • BASE: using abstraction to improve fault tolerance, R. Rodrigo et al, SOSP 2001

  • R.Kotla and M. Dahlin, High Throughput Byzantine Fault tolerance. DSN 2004

  • J. Li and D. Mazieres, Beyond one-third faulty replicas in Byzantine fault tolerant systems, NSDI 07

  • Abd-El-Malek et al, Fault-scalable Byzantine fault-tolerant services, SOSP 05

  • HQ replications: a hybrid quorum protocol for Byzantine Fault tolerance, OSDI 06


Papers in sosp 07
Papers in SOSP 07

  • Monday 1:30-3:30

    • Zyzzyva: Speculative Byzantine fault tolerance

    • Tolerating Byzantine faults in database systems using commit barrier scheduling

    • Low-overhead Byzantine fault-tolerant storage

    • Attested append-only memory: making adversaries stick to their word

  • Tuesday: 11:00-12:00

    • PeerReview: practical accountability for distributed systems


ad