1 / 23

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing

This lecture covers the results of the midterm exam, group communication systems, membership protocols, agreed and safe delivery, and checkpointing and recovery in secure and dependable computing.

pmckinzie
Download Presentation

EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EEC 693/793Special Topics in Electrical EngineeringSecure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org

  2. Outline • Midterm#2 result • Group communication systems • Membership protocols • Agreed and safe delivery • Checkpointing and recovery • Reference: • Reliable distributed systems, by K. P. Birman, Springer; Chapter 14-16 EEC693: Secure & Dependable Computing

  3. Midterm#2 Result • High 98, low 79, mean 92.7 • Average Q1-18.9, Q2-17.6, Q3-18.3, Q4-19.1, Q5-18.9 EEC693: Secure & Dependable Computing

  4. Unreliable Failure Detection • Recall that failures are hard to distinguish from network delay • So we accept risk of mistake • If p is running a protocol to exclude q because “q has failed”, all processes that hear from p will cut channels to q • Avoids “messages from the dead” • q must rejoin to participate in GMS again EEC693: Secure & Dependable Computing

  5. Basic GMP • Someone reports that “q has failed” • Leader (process p) runs a 2-phase commit protocol • Announces a “proposed new GMS view” • Excludes q, or might add some members who are joining, or could do both at once • Waits until a majority of members of current view have voted “ok” • Then commits the change EEC693: Secure & Dependable Computing

  6. GMP Example • Proposes new view: {p,r} [-q] • Needs majority consent: p itself, plus one more (“current” view had 3 members) • Can add members at the same time Proposed V1 = {p,r} Commit V1 p q r OK V0 = {p,q,r} V1 = {p,r} EEC693: Secure & Dependable Computing

  7. Special Concerns? • What if someone doesn’t respond? • P can tolerate failures of a minority of members of the current view • New first-round “overlaps” its commit: • “Commit that q has left. Propose add s and drop r” • P must wait if it can’t contact a majority • Avoids risk of partitioning EEC693: Secure & Dependable Computing

  8. What If Leader Fails? • Here we do a 3-phase protocol • New leader identifies itself based on age ranking (oldest surviving process) • It runs an inquiry phase • “The adored leader has died. Did he say anything to you before passing away?” • Note that this causes participants to cut connections to the adored previous leader • Then run normal 2-phase protocol but “terminate” any interrupted view changes leader had initiated EEC693: Secure & Dependable Computing

  9. GMP Example p • New leader first sends an inquiry • Then proposes new view: {r,s} [-p] • Needs majority consent: q itself, plus one more (“current” view had 3 members) • Again, can add members at the same time Inquire [-p] Proposed V1 = {r,s} Commit V1 q r OK: nothing was pending OK V0 = {p,q,r} V1 = {r,s} EEC693: Secure & Dependable Computing

  10. Safe and Agreed Delivery • For totally ordered reliable multicast, there are two delivery policies • Safe delivery: a message is delivered only when all correct processes have received it • Agreed delivery: a message is delivered as long as it is the next message in total order EEC693: Secure & Dependable Computing

  11. Safe and Agreed Delivery • Safe delivery guarantees the uniformity of multicast: • If a message is delivered to any process, it is delivered by all correct processes • Agreed delivery does not: • It is possible that a message is delivered in one (or more) process, but is not delivered by some correct process EEC693: Secure & Dependable Computing

  12. Checkpointing • Checkpointing: the act of taking a snapshot of an entity so that we can restore it later • A replica is a process running in an operating system. The state of a process • Processes' memory, stack and registers • Threads • Open or mmap'ed files • Current working directory • Interprocess communication: • Semaphores, shared memory, pipes, sockets • Dynamic Load Libraries • … EEC693: Secure & Dependable Computing

  13. Checkpointing • Many tools are available to perform checkpointing transparently or semi-transparently • http://www.checkpointing.org/ • Condor, libckpt, etc. • Checkpoints taken in general are not portable • Checkpoint size might be big EEC693: Secure & Dependable Computing

  14. Checkpointing of Application State • Sometimes it is more efficient to save and store the application state only • Checkpoints can be very portable and compact in size • class Counter { int counter; Counter(int initVal) { counter = initVal; } void increment() {counter++; } void decrement() {counter--; } void setState(int c) {counter = c; } int getState() { return counter;}|} EEC693: Secure & Dependable Computing

  15. Logging • Logging of messages • Checkpointing in general is expensive • Logging of messages is cheaper => we can periodically do checkpointing, or do checkpointing on demand and log all messages in between • Logging of other non-deterministic activities • Access order to shared data EEC693: Secure & Dependable Computing

  16. Recovery • Roll-backward recovery • Used primarily by transaction processing • When a failure occurs, roll back using the most recent checkpoint (and retry) • Roll-forward recovery • Used primarily in space redundancy • To recover a repaired replica, transfer the state from a current replica to the recovering replica EEC693: Secure & Dependable Computing

  17. Roll-Forward Recovery • With replication in space, it is possible to recover a fault while the system is progressing ahead • Roll-forward recovery is made possible by • Checkpointing of replica state • Logging of incoming messages • Reliable, totally ordered group communication system EEC693: Secure & Dependable Computing

  18. Roll-Forward Recovery • We want to ensure the newly admitted replica to have a consistent state with others when it starts • Steps of adding a new replica into a group (with on-demand checkpointing) • A recovered (or a new) replica joins a group • A join message is multicast in total order • On receiving the join message, it is put into incoming message queue and wait for processing • When the join message is at the head of the queue, a checkpoint is taken and it is transferred to the new replica EEC693: Secure & Dependable Computing

  19. Roll-Forward Recovery • At the new replica, it starts queueing messages after it receives the join messages (sent by itself) • When the checkpoint is received by the new replica, its state is restored using the received checkpoint (the checkpoint is delivered out of order!) • The queued messages are delivered in order, at the new replica • Other replicas do not stop and wait for the new replica • Steps of adding a new replica into a group with periodic checkpointing is similar EEC693: Secure & Dependable Computing

  20. Steps of Roll-Forward Recovery EEC693: Secure & Dependable Computing

  21. Steps of Roll-Forward Recovery EEC693: Secure & Dependable Computing

  22. Steps of Roll-Forward Recovery EEC693: Secure & Dependable Computing

  23. Steps of Roll-Forward Recovery EEC693: Secure & Dependable Computing

More Related