1 / 40

When Parallel met Distributed

When Parallel met Distributed. Hagit Attiya CS, Technion. My Qualifications…. 6 papers in SPAA… & one paper in AWOC 1988! About rings (w/ Snir). What’s Parallel Computing?. What’s Distributed Computing?. load balancing (95/11%) randomized algorithms (89/11%) online algorithms (87/10%)

lave
Download Presentation

When Parallel met Distributed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. When Parallel met Distributed Hagit Attiya CS, Technion

  2. My Qualifications… • 6 papers in SPAA… • & one paper in AWOC 1988! • About rings (w/ Snir) SPAA

  3. What’s Parallel Computing? SPAA

  4. What’s Distributed Computing? SPAA

  5. load balancing (95/11%) randomized algorithms (89/11%) online algorithms (87/10%) QoS (87/10%) sensor networks (83/10%) approximation algorithms (80/9%) simulation (76/9%) fault tolerance (75/9%) wireless networks (74/9%) performance evaluation (67/8%) mobile networks (67/8%) scheduling (64/8%) algorithms (63/7%) network (63/7%) peer-to-peer (59/7%) ad hoc networks (57/7%) Top Keywords in SPAA (2003-2007) SPAA

  6. fault tolerance (211/24%) sensor networks (155/18%) distributed algorithms (144/17%) self-stabilization (126/14%) randomized algorithms (104/12%) dominating set (94/11%) ad hoc networks (88/10%) lower bounds (87/10%) security (83/10%) routing (82/9%) scalability (81/9%) shared memory (80/9%) replication (77/9%) reliability (77/9%) distributed systems (77/9%) mobile agents (75/9%) Top Keywords in PODC (2003-2007) SPAA

  7. load balancing (95/11%) randomized algorithms (89/11%) online algorithms (87/10%) QoS (87/10%) sensor networks (83/10%) approximation algorithms (80/9%) simulation (76/9%) fault tolerance (75/9%) wireless networks (74/9%) performance evaluation (67/8%) mobile networks (67/8%) scheduling (64/8%) algorithms (63/7%) network (63/7%) peer-to-peer (59/7%) ad hoc networks (57/7%) fault tolerance (211/24%) sensor networks (155/18%) distributed algorithms (144/17%) self-stabilization (126/14%) randomized algorithms (104/12%) dominating set (94/11%) ad hoc networks (88/10%) lower bounds (87/10%) security (83/10%) routing (82/9%) scalability (81/9%) shared memory (80/9%) replication (77/9%) reliability (77/9%) distributed systems (77/9%) mobile agents (75/9%) Let’s Compare SPAA

  8. Topics are Merging It used to be… • Synchronous shared-memory  SPAA • Asynchronous message-passing  PODC Nowadays… The Network is a Computer • Peer-2-peer systems, the grid, clusters The Computer is a Network • Network on chip, PRAM on chip SPAA

  9. Uncertainty Uncertainty Uncertainty due to asynchrony due to scale due to failures What Parallel takes from Distributed? SPAA

  10. What Distributed takes from Parallel? • Simulations and reductions between models and conversely, • Separation between models SPAA

  11. Case in Point: Simulating Shared Memory [Attiya, Bar-Noy, Dolev, PODC 1990] • Provide a single-writer multi-reader register in a message-passing system • Accessed by read and write operations Read Write(7) Write(0) SPAA

  12. Write(7) Read 7 Atomicity (AKA Linearizability) Read Write(7) Write(0) SPAA

  13. (Slight) Complication: Failures • For now, only crash failures • Processes just stop taking steps • Further complicated due to asynchrony Read Write(7) Write(0) SPAA

  14. Simulating Shared Memory w/ Failures • Requires a majority of nonfaulty processes • Otherwise, the system can be partitioned • A read will “miss” the latest write Read Write(7) Write(0) SPAA

  15. Two Inspirations • Simulation of a PRAM on a synchronous interconnect (e.g., Ultracomputer) [Upfal, Wigderson, FOCS 1984] • Complete communication graph or a concentrator • No failures • Replicate data to reduce latency • Access a majority • The majority consensus approach to concurrency control [Thomas, TODS 1979] The abstraction The algorithm SPAA

  16. The Algorithm in a Nutshell • Each data item has a version number • A sequence of values • write(d, val, v#) • Waits for n-f oks • read(d) returns (val, v#) • Waits for n-f responses, pick largest v# • Do a write-back to ensure atomicity of reads SPAA

  17. The Algorithm in Action: Write value: value: value: 0 0 0 X write 1 write 1 write A SPAA

  18. The Algorithm in Action: Write value: value: value: 1 1 0 X write 1 write 1 ok ok write A SPAA

  19. The Algorithm in Action: Read value: value: value: 1 1 0 X X read read read 1 0 SPAA

  20. Implications • Allows to port algorithms from shared memory to message-passing systems, e.g., • atomic snapshots • safe consensus • approximate agreement • randomized consensus • Made the message-passing model “obsolete” when studying computability [Borowsky, Gafni][Herlihy, Shavit][Mostefaoui, Rajsbaum, Raynal]… SPAA

  21. An Abstract View: Quorums [Gifford, SOSP 1979][Garcia-Molina, Barbara, JACM 1985] • read and write quorums • An pair of write-write or write-read quorums has a large intersection SPAA

  22. Sharing with Quorums Apply the previous algorithm with write and read quorums Write(7) Read SPAA

  23. The simplest quorum system uses majority subsets But can pick other quorum systems When fewer processes fail So as to optimize the load and availability of quorums [Naor, Wool, FOCS 1994] Separation of concerns… More on Quorums SPAA

  24. Even More Robust: Dynamic Changes RAMBO, e.g., [Lynch, Shvartsman 2002] • Participants can join or leave SPAA

  25. Even More Robust: Dynamic Changes RAMBO, e.g., [Lynch, Shvartsman 2002] • Participants can join or leave • Configuration: participants + set of read & write quorums • Emulate reads and writes using the quorums (ABD) SPAA

  26. RAMBO: Reconfiguration • Modify the set of participants and the quorums SPAA

  27. Reconfiguration • Modify the set of participants and the quorums • Need to agree on the new configuration • A safe consensus protocol • Implemented from “shared registers” • May take very long, perhaps even not terminate • full-fledged consensus is impossible in this setting SPAA

  28. Reconfiguration: Co-Existence • Reconfiguration proceeds concurrently with the quorum-based reading and writing algorithm • When in transition between configurations, use representative quorums from all configurations Write(7) SPAA

  29. Even More Robust: Byzantine Failures • Nodes fail arbitrarily • They lie, they collude • Causes • Malicious attacks • Non-deterministic software errors SPAA

  30. Byzantine Quorums [Malkhi, Reiter, STOC 1997] 3f+1 replicas are needed to survive f failures 2f+1 replicas is a quorum Ensures intersection of size f+1 Need many copies with same v# Minimal in an asynchronous network There are other quorum systems [Malkhi, Reiter, Wool 1997][Bazzi 1997]… Optimizing load and availability SPAA

  31. Application: Replicated Servers • Clients invoke operationsservers only respond to them • By the same protocol • Clients may crash SPAA

  32. Disk Paxos [Gafni, Lamport 2003] • A protocol for replicated servers in a storage area (SAN) network • Design a shared memory algorithm • Translate to a SAN algorithm using ABD • optimized: e.g., remove v# (can be inferred from protocol messages) SPAA

  33. Replicating a Server State: State: State: … … … Servers X write A write A write A Clients SPAA

  34. Replicating a Server State: State: State: … … … A A X Servers Clients SPAA

  35. Replicating a Server State: State: State: … … … A A B X Servers X write B write B write B Clients SPAA

  36. Byzantine Servers? … … … … State: State: State: State: A A A Servers X write A write A write A write A Clients SPAA

  37. Byzantine Servers: Quorums in Action … … … … State: State: State: State: A A B B B Servers X write B write B write B write B Clients SPAA

  38. Morale: The Art of Abstraction • The right abstraction can lead to many crucial algorithms • Finding the right abstractions is key for designing good systems • Hide enough “under the hood” to provide system designers good leverage • But not too much, so their implementation is efficient (or easily admits optimizations) SPAA

  39. More Context • Client failures • Optimizations • Reducing communication and reconfiguration • Improving the common case, without harming the worst case • Adaptations to new network technologies • Ad-hoc, mobile, sensor SPAA

  40. Thank you…

More Related