1 / 85

Programming Model and Protocols for Reconfigurable Distributed Systems

&. Programming Model and Protocols for Reconfigurable Distributed Systems. C OSMIN I ONEL A RAD. https://www.kth.se/profile/icarad/page/doctoral-thesis/. Doctoral Thesis Defense , 5 th June 2013 KTH Royal Institute of Technology. Presentation Overview.

Download Presentation

Programming Model and Protocols for Reconfigurable Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. & Programming Model and Protocolsfor Reconfigurable Distributed Systems COSMIN IONEL ARAD https://www.kth.se/profile/icarad/page/doctoral-thesis/ Doctoral Thesis Defense, 5th June 2013 KTH Royal Institute of Technology

  2. Presentation Overview • Context, Motivation, and Thesis Goals • introduction & Design philosophy • Distributed abstractions& P2P framework • Component execution & Scheduling • Distributed systems Experimentation • Development cycle: build, test, debug, deploy • scalable & consistent key-value store • System architecture and testing using • Scalability, Elasticity, and Performance Evaluation • Conclusions

  3. Trend 1: Computer systems are increasingly distributed • For fault-tolerance • E.g.: replicated state machines • For scalability • E.g.: distributed databases • Due to inherent geographic distribution • E.g.: content distribution networks

  4. Trend 2: Distributed systems areincreasingly complex connection management, location and routing, failuredetection, recovery, datapersistence, loadbalancing, scheduling, self-optimization, access-control, monitoring, garbagecollection, encryption, compression, concurrency control, topology maintenance, bootstrapping, ...

  5. Trend 3: Modern Hardware isincreasingly parallel • Multi-core and many-core processors • Concurrent/parallel software is needed to leverage hardware parallelism • Major software concurrency models • Message-passing concurrency  • Data-flow concurrency viewed as a special case • Shared-state concurrency 

  6. Distributed Systems are still Hard… • … to implement, test, and debug • Sequential sorting is easy • Even for a first-yearcomputer science student • Distributed consensus is hard • Even for an experienced practitionerhaving all the necessary expertise

  7. Experience from building Chubby,Google’s lock service, using Paxos “The fault-tolerance computing community has not developed the tools to make it easyto implement their algorithms. The fault-tolerance computing community has not paid enough attention to testing, a key ingredient for building fault-tolerant systems.” [Paxos Made Live] Tushar Deepak ChandraEdsger W. Dijkstra Prize in Distributed Computing 2010

  8. A call to action “It appears that the fault-tolerant distributed computing community has not developed the tools and know-how to close the gaps between theory and practice with the same vigor as for instance the compiler community. Our experience suggests that these gaps are non-trivial and that they merit attention by the research community.”[Paxos Made Live] Tushar Deepak Chandra EdsgerW. Dijkstra Prize in Distributed Computing 2010

  9. Thesis Goals Raise the level of abstraction in programming distributed systems Make it easy to implement, test, debug, and evaluate distributed systems Attempt to bridge the gap between the theory and the practice of fault-tolerant distributed computing

  10. We want to build distributed systems

  11. by composing distributed protocols

  12. implemented as reactive,concurrent components

  13. with asynchronous communication and message-passingconcurrency Application Consensus Broadcast Failure detector Network Timer

  14. Design principles • Tackle increasing system complexity through abstractionand hierarchicalcomposition • Decouple components from each other • publish-subscribe component interaction • dynamic reconfiguration for always-on systems • Decouplecomponent code from its executor • same code executed in different modes: production deployment, interactive stress testing, deterministic simulationfor replay debugging

  15. Nested hierarchical composition • Model entire sub-systems as first-class composite components • Richer architectural patterns • Tackle system complexity • Hiding implementation details • Isolation • Natural fit for developing distributed systems • Virtual nodes • Model entire system: each node as a component

  16. Message-passing concurrency • Compositional concurrency • Free from the idiosyncrasies of locks and threads • Easy to reason about • Many concurrency formalisms: the Actor model (1973), CSP (1978), CCS (1980), π-calculus (1992) • Easy to program • See the success of Erlang, and Go, Rust, Akka, ... • Scales well on multi-core hardware • Almost all modern hardware

  17. Loose coupling • “Where ignorance is bliss, 'tis folly to be wise.” • Thomas Gray, Ode on a Distant Prospect of Eton College (1742) • Communication integrity • Law of Demeter • Publish-subscribe communication • Dynamic reconfiguration

  18. Design Philosophy • Nested hierarchical composition • Message-passing concurrency • Loose coupling • Multiple execution modes

  19. Component Model Event Port Component channel handler • Event • Port • Component • Channel • Handler • Subscription • Publication / Event trigger

  20. A simple distributed system Process1 Process2 Ping Pong Pong Ping Application Application handler1 <Ping> handler2 <Pong> handler1 <Ping> handler2 <Pong> Network Network Network Network Network Comp Network Comp handler < > handler < > handler <Message> handler <Message>

  21. A Failure Detector Abstraction usinga Network and a Timer Abstraction Eventually Perfect Failure Detector Ping Failure Detector Network Timer +  Eventually Perfect Failure Detector Suspect Restore +  Network Timer MyTimer MyNetwork StartMonitoring StopMonitoring

  22. A Leader Election Abstraction usinga Failure Detector Abstraction +  Leader Leader Election Ω Leader Elector Eventually Perfect Failure Detector +  Leader Election Eventually Perfect Failure Detector Ping Failure Detector

  23. A Reliable Broadcast Abstraction using a Best-Effort Broadcast Abstraction RbDeliver Deliver RbBroadcast Broadcast BebDeliver Deliver BebBroadcast Broadcast +  Deliver Broadcast Broadcast Reliable Broadcast +  Broadcast Broadcast Broadcast Best-Effort Broadcast Network

  24. A Consensus Abstraction using aBroadcast, a Network, and a Leader Election Abstraction Consensus PaxosConsensus Broadcast Network Leader Election Network Broadcast Leader Election MyNetwork Best-Effort Broadcast Ω Leader Elector

  25. A Shared Memory Abstraction Atomic Register ABD Broadcast Network Broadcast +  Atomic Register Best-Effort Broadcast ReadResponse WriteResponse +  Network ReadRequest WriteRequest Network MyNetwork

  26. A Replicated State Machine usinga Total-Order Broadcast Abstraction +  +  +  Decide TobDeliver Output Replicated State Machine State Machine Replication Propose Execute TobBroadcast Total-Order Broadcast +  +  +  Replicated State Machine Total-Order Broadcast Consensus Total-Order Broadcast Uniform Total-Order Broadcast Consensus Consensus PaxosConsensus

  27. Probabilistic Broadcast and Topology Maintenance Abstractions using aPeer Sampling Abstraction Topology Probabilistic Broadcast T-Man Epidemic Dissemination Network Network Peer Sampling Peer Sampling Peer Sampling Cyclon Random Overlay Network Timer

  28. A Structured Overlay Network implements a Distributed Hash Table Distributed Hash Table Structured Overlay Network Consistent Hashing Ring Topology Overlay Router Chord Periodic Stabilization One-Hop Router Network Network Failure Detector Peer Sampling Failure Detector Peer Sampling Ping Failure Detector Cyclon Random Overlay Network Timer Network Timer Network Timer

  29. A Video on Demand Service using a Content Distribution Networkand a Gradient Topology Overlay Video On-Demand Network Content Distribution Network Timer Gradient Topology Content Distribution Network Gradient Topology BitTorrent Gradient Overlay Network Tracker Network Peer Sampling Tracker Tracker Tracker Peer Exchange Distributed Tracker Centralized Tracker Client Network Timer Peer Sampling Distributed Hash Table

  30. + + + – – + – – – – + + + – – + – + Timer Timer Timer Network Network Web Web Network Web Timer Network Network Timer Timer Network Web Web Web Generic Bootstrap and Monitoring Services provided by the Kompics Peer-to-Peer Protocol Framework PeerMain BootstrapServerMain MonitorServerMain MyWebServer MyWebServer MyWebServer Peer BootstrapServer MonitorServer MyNetwork MyTimer MyNetwork MyTimer MyNetwork MyTimer

  31. Whole-System Repeatable Simulation Deterministic SimulationScheduler Experiment Scenario Network Model

  32. Experiment scenario DSL • Define parameterized scenario events • Node failures, joins, system requests, operations • Define “stochastic processes” • Finite sequence of scenario events • Specify distribution of event inter-arrival times • Specify type and number of events in sequence • Specify distribution of each event parameter value • Scenario: composition of “stochastic processes” • Sequential, parallel:

  33. Local Interactive Stress Testing Work-Stealing Multi-CoreScheduler Experiment Scenario Network Model

  34. execution profiles • Distributed Production Deployment • One distributed system node per OS process • Multi-core component scheduler (work stealing) • Local / Distributed Stress Testing • Entire distributed system in one OS process • Interactive stress testing, multi-core scheduler • Local Repeatable Whole-system Simulation • Deterministic simulation component scheduler • Correctness testing, stepped / replay debugging

  35. Incremental Development&Testing • Define emulated network topologies • processes and their addresses: <id, IP, port> • properties of links between processes • latency (ms) • loss rate (%) • Define small-scale execution scenarios • the sequence of service requests initiated by each process in the distributed system • Experiment with various topologies / scenarios • Launch all processes locally on one machine

  36. Distributed System Launcher

  37. The script of service requests of the process is shown here… After the Application completes the script it can process further commands input here…

  38. Programming in the Large • Events and ports are interfaces • service abstractions • packaged together as libraries • Components are implementations • provide or require interfaces • dependencies on provided / required interfaces • expressed as library dependencies [Apache Maven] • multiple implementations for an interface • separate libraries • deploy-time composition

  39. KompicsScala, by Lars Kroll

  40. Kompics Python, by NiklasEkström

  41. Case study A Scalable, Self-Managing Key-Value Store withAtomic Consistency and Partition Tolerance

  42. Key-Value Store? Put(”www.sics.se”,”193.10.64.51”)  OK Get(”www.sics.se”) ”193.10.64.51” • Store.Put(key, value) OK [write] • Store.Get(key)  value[read]

  43. Consistent Hashing Dynamo Incrementalscalability Self-organization Simplicity Project Voldemort

  44. Single client, Single server Put(X, 1) Ack(X) Get(X) Return(1) Client Server X = 1 X = 0

  45. Multiple clients, Multiple servers Put(X, 1) Ack(X) Client 1 Get(X) Return(0) Get(X) Return(1) Client 2 Server 1 X = 1 X = 0 Server 2 X = 1 X = 0

  46. Atomic Consistency Informally • put/get ops appear to occur instantaneously • Once a put(key, newValue) completes • new value immediately visible to all readers • each get returns the value of the last completed put • Once a get(key) returns a new value • no other get may return an older, stale value

  47. Distributed Hash Table Web CATS Node Status Web Peer Status Status Monitor CATS Web Application Load Balancer Aggregation Network Distributed Hash Table Peer Status Status Status Network Timer Status Distributed Hash Table Overlay Router Status Status Garbage Collector Operation Coordinator One-Hop Router Overlay Router Network Timer Replication Network Peer Sampling Broadcast Aggregation Broadcast Status Replication Status Group Member Reconfiguration Coordinator Epidemic Dissemination Peer Sampling Ring Topology Network Network Timer Data Transfer Network Ring Topology Status Peer Sampling Status Data Transfer Status Consistent Hashing Ring Bulk Data Transfer Cyclon Random Overlay Network Bootstrap Timer Failure Detector Network Timer Local Store Network Failure Detector Status Bootstrap Local Store Status Persistent Storage Ping Failure Detector Bootstrap Client Network Timer Network Timer Network Timer

  48. Simulation and Stress Testing CATS Simulation Main CATS Stress Testing Main SimulationScheduler Multi-core Scheduler Web Web CATS Simulator CATS Simulator Web Web DHT DHT Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Web Web DHT DHT Network Network Timer Timer CATS Node CATS Node Network Network Timer Timer CATS Node CATS Node Network Network Timer Timer Network Network Timer Timer Network Network Timer Timer Network Network Timer Timer Network Network Timer Timer Network Network Timer Timer Network Network Timer Timer Timer Timer Network CATS Experiment Network CATS Experiment Timer Timer Network CATS Experiment Network CATS Experiment Discrete-Event Simulator Generic Orchestrator Network Model Network Model Experiment Scenario Experiment Scenario

More Related