Introduction & Background Lakshmish Ramaswamy
Why Distributed Systems? • A collection of independent computers that appears to its users as a single coherent system • Reasons for distribution • Distributed (and mobile) users • Distributed data/information • Distributed organizations • Distributed resources • Enabling technology – Communications and networking
Distributed System Organization 1.1 A distributed system organized as middleware.Note that the middleware layer extends over multiple machines.
Design Goals • Enable controlled resource sharing • Transparency • Openness • Scalability • Performance • Failure resilience • Security & privacy
Examples of Distributed Systems • World Wide Web • Information disseminations • E-commerce • Distributed file systems • Distributed databases • Web-farms • P2P file sharing systems • Ad-hoc networks • Sensor networks
Middleware • Layer on top of Network OS services • Hide heterogeneity • Doesn’t manage individual nodes • Provides complete set of services
Client Server Model • Earliest model • Simple • Still applicable in many scenarios • Server • Implements specific service • Client • Requests service • Models of communication • Connectionless • Connection-oriented
Clients and Servers 1.25 • General interaction between a client and a server.
Multitiered Architectures (2) 1-30 • An example of a server acting as a client.
Modern Architectures • Vertical Distribution: Different components on different machines • Horizontal Distribution: Each part operates on its own complete • Hybrid: Incorporates features of both vertical and horizontal 1-31 • An example of horizontal distribution of a Web service.
Peer-to-Peer Architectures • No distinction between client and server • Nodes can act both as client and server • Promotes interaction within social groups • Provides better scalability • File sharing has been the dominant application • Napster, Gnutella, Kazaa • Other applications are still in nascent stages • Decentralized protocols
Network Protocols 2-1 • Layers, interfaces, and protocols in the OSI model.
Functionalities of Layers • Physical: Standardizes signaling interfaces • Data link: Organizes bits to form frames, detects and corrects transmission errors • Network layer: Routing (Internet protocol [IP]) • Transport layer: Reliability (retransmission, ordering of packets) • Session layer: Dialog control and synchronization • Presentation layer: Formats of messages and records • Application layer: Specific to applications (HTTP, FTP)
Types of Communication • Persistence • Persistent communication – Stores message until communicated to user • Transient communication – Stored only when sending and receiving processes are alive • Transport level protocols provide transient communication • Synchronicity • Asynchronous – Sender continues after sending message • Synchronous – Sender blocks until message is stored at receiver's local buffer, delivered to receiver or processed by receiver
Message Oriented Transient Communication -Berkeley Sockets • Interface for transport layer • A communications end point • Communication pattern using TCP/IP sockets
Processes & Threads • Virtual processors • Created by OS to execute a program • Process is a program in execution • Executed on one of the virtual processors • Operating systems ensure that processes are independent and transparent • Resource sharing is transparent • Creating processes is costly • Switching processes is costly too
Threads • Similar to a process • Perceived as execution of (a part of) program • Information maintained for sharing CPU is minimal • Context of threads is captured by CPU context • May be a little more information is needed for management (like locks) • Very little overheads • Thread switching is easy • Can provide performance gains
Names & Naming System • Required for identifying entities, locating them, communicating to them • Name can be resolved to the entity it refers to • Name is a string of bits used to refer to an entity • Entity can resources/users/data/processes • Access Point – Host of another entity • Name of access point is its address • Naming system resolves names • Naming system in distributed systems can itself be distributed
Name Spaces • Organization of names usually as a directed graph • Leaf Node – Represents named entity • Directory node – Enlists other names • A general naming graph with a single root node.
Name Space Distribution • An example partitioning of the DNS name space, including Internet-accessible files, into three layers.
Importance of Clocks & Synchronization • Avoiding simultaneous access of resources • Process may need to agree upon ordering of events • Synchronization & ordering is difficult in distributed setting • Notion of time is tricky in distributed setting • How to deal with clock drifts? • Logical clocks • Agreement with regards to ordering of events suffices • Happens-before relation
Mutual Exclusion • Ensuring consistency of data sometimes needs exclusive access to data • Critical regions for mutual exclusion • When a process wants to read/update shared data structures it first enters a critical region • Only one process allowed to be in the critical region • Coordinator-based centralized algorithm • Ricart and Agrawala’s algorithm • Token ring algorithm
Transactions • Protects data and allows processes to access and modify multiple data items as a single atomic transaction • If process backs out halfway, everything is restored back • Originated in business world • Parties free to negotiate and back-off during negotiation • No backing-off after the contract is signed • Initiator process announces the beginning of a transaction • Processes create, update, and delete entries • Initiator announces that it wants others to “commit” • Transaction made permanent if everyone agrees • Otherwise transaction is aborted and all entries are restored back
Transaction Primitives • Examples of primitives for transactions.
ACID Properties of Transactions • Atomic – Happens indivisibly to the outside world • Consistent – Does not violate system constraints • Isolated – Concurrent transactions do not interfere with each other • Durable – Changes are permanent when a transaction commits
How to Implement Transactions? • Private workspace • When a process starts a transaction, it gets a private workspace of all files it needs to use • Operations only on private workspace • Private workspace is written back (ignored) on commit (abort) • Efficiency problems – copying everything is costly.
Distributed Transactions • Distributed transaction is a transaction where in data is distributed • 2 Phase commit protocol • Commit request phase • Coordinator sends query to commit message to all nodes • Nodes place an entry into their undo and redo logs • Nodes send agreement/abort messages • Commit phase • Coordinator places an entry into log • Sends commit/abort messages to all nodes • Nodes send acknowledgements
Concurrency Control • Concurrent transactions are isolated • Final result should be the same as if the transactions were executed one after another in some order • Synchronization classification • Locking • Timestamps • Two phase locking – Growing & shrinking phases • Transaction acquires all locks before releasing any of them • Distributed 2PL • Coordinator manages all lock operations
Replication • Two primary reasons • Improving reliability of system • Improving scalability and performance of system • Reliability • Resilience to failures • Protection against data corruption: Byzantine failures and quorum-based systems • Scalability • Scaling in numbers • Geographical scaling
Problems of Replication • Creating and maintaining replicas is not free • Multiple copies leads to consistency problems • What happens when one of the replicas gets modified? • Modifications have to be carried out at all replicas • How and when determines the cost of replication • WWW-based systems • Browser and client side caches • May lead to stale pages • TTL model, Update/Invalidate model
Consistency Models • Strict • Sequential • Linearizable • Causal • Fifo • Weak • Release • Entry
Fault Tolerance & Dependability • Availability • Ready to be used IMMEDIATELY • Reliability • Run continuously without FAILURE • Safety • When fails, nothing catastrophic happens • Maintainability • How easy a failed system can be repaired • Failures can be malicious or non-malicious
Failure Masking • Hiding failures from other processes • Fault tolerance by redundancy • Information redundancy – Error correcting codes • Temporal redundancy – Transactions • Physical redundancy – Multiple disks