Distributed Systems

Distributed Systems Docent: Vincent Naessens

Chapter 1: Introduction

Chapter 1: Introduction • 1.1. Definition of distributed system • 1.2. Goals • 1.3. Types of distributed systems

1.1. Definition • A distributed system is a collection of independent computers that appears to its users as a single coherent system

1.2. Goals • When building a distributed system? • Making resources accessible • Distribution transparency • Openness • Scalability

1.2.1. Making resources accessible • Examples are printers, computers, files, ... • Why sharing resources? • Economical reasons • Enabling collaboration (by means of groupware) • Collaborative editing • Teleconferencing systems • ... • New problems with increased connectivity • Security and privacy problems • Unwanted communication (junk mail...)

1.2.2. Distribution transparency

1.2.2. Distribution transparency • Limits to degree of transparency offered to users: • Requesting digital newspaper at 7 A.M. at the other side of the world • Attempting to mask a transient server failure without performance loss • Updating replica’s without performance loss • Distribution exposure is essential in context aware services • Printing jobs on a nearby printer instead of a free printer comprehensibility performance transparency

1.2.3. Openness • Offering services according to standard rules • Standaard rules for syntax and semantics • Service specification (i.e. syntax) through interfaces • IDL = Interface Definition Language • IDL captures the syntax of services • Semantics in natural language • Interoperability • Extent by which two implementations of the same interface can work together • Portability • Extent to which SW components can work on multiple implementations of the same IDL • Extensibility • Extent to which new components can be added to a system

1.2.3. Openness • What interfaces should be supported: • Interfaces seen to users and applications • Internal interfaces • Allows to build new components and plug them in • Example: new caching strategy/component

1.2.4. Scalability • 3 dimensions of scalability • SIZE: adding resources and users • LOCATION: distance between resources/users • ADMINISTRATION: manageable in complex settings • Scalability often leads to performance loss • reason: centralisation does not work anymore

1.2.4. Scalability • Characteristics of decentralized algorithms: • No machine has complete info about the system state. • Machines make decisions based only on local info. • Failure of one machine does not ruin the algorithm. • No global clock exists. • Geographic scalability problems (LAN  WAN) • Communication delays • Broadcasting technique can no longer be used • Administrative scalability problems • Conflicting policies related to payment, security...

1.2.4. Scalability • Scalability problems caused by: • Limited capacities of servers • Limited capacities of networks • Three techniques for scalability • Replication • Hiding communication latencies • distribution

1.2.4. Scalability • Hiding communication latencies • Avoid waiting during remote service requests • TECH_1: Synchronous comm.  Asynchronous comm. • Interrupt handling routines • TECH_2: Creating a seperate thread • Moving compution to the clients • done by Javascript and Java applets • Example: see next slide

1.2.4. Scalability • Hiding communication latencies

1.2.4. Scalability • Distribution • Splitting a component in smaller parts • Spreading the parts across the system • Example: Internet Domain Name System (DNS)

1.2.4. Scalability • Replication • GOAL_1: increasing availability • GOAL_2: increasing performance • Caching is special form of replication • Copy in proximity of the client • May lead to consistency problems • SOLUTION: Global synchronisation strategies

1.2.5. Pitfalls • False assumptions made by first time developer: • The network is reliable. • The network is secure. • The network is homogeneous. • The topology does not change. • Latency is zero. • Bandwidth is infinite. • Transport cost is zero. • There is one administrator.

1.3. Types of distributed systems • Distributed Computing Systems Cluster Computing systems - homogeneous

1.3. Types of distributed systems • Distributed Computing Systems • Different operating systems • Different admin. domains • Different networks • ... Grid Computing systems - heterogeneous MIDDLEWARE Creation of virtual organisations to fulfill tasks tasks (leads to service oriented architectures)

1.3. Types of distributed systems • Distributed Information Systems • Distributed transactions • client retrieves info from multiple servers in 1 request • Enterprise Application Integration (EAI) • Applications really communicate/collaborate

1.3. Types of distributed systems • Distributed Information Systems Transaction Processing Systems

1.3. Types of distributed systems • Distributed Information Systems • ACID properties of transactions: • Atomic: the transaction happens indivisibly • Consistent: the transaction holds system invariants • Isolated: Concurrent transactions do not interfere • Durable: Once a transaction commits, the changes are permanent. Transaction Processing Systems

1.3. Types of distributed systems • Distributed Information Systems Transaction Processing Systems

1.3. Types of distributed systems • Distributed Information Systems • Support for nested transactions: TP Monitor Transaction Processing Systems

1.3. Types of distributed systems • Distributed Information Systems • TP monitors are applications • Need to enable communication between applications • “interapplication communication” Enterprise Application Integration

1.3. Types of distributed systems • Distributed Information Systems • Examples: • RPC: doing local procedure call to remote procedure • RMI: doing local method call to remote object • Message-oriented middleware (MOM) • Solves reference problems • Solves “up-and-running” problems • PUBLISH/SUBSCRIBE systems Enterprise Application Integration

1.3. Types of distributed systems • Distributed Pervasive Systems • Mobile and embedded devices lead to instability • Requirements for pervasive systems • Embrace contextual changes • EX: Absence/presence of network connection • Encourage ad hoc composition • EX: used by different individuals • Recognize sharing as the default.

1.3. Types of distributed systems • Distributed Pervasive Systems • Where and how should monitored data be stored? • How can we prevent loss of crucial data? • What infrastructure is needed to generate and propagate alerts? • How can physicians provide online feedback? • How can extreme robustness of the monitoring system be realized? • What are the security issues and how can the proper policies be enforced? Example 1: ELECTRONIC HEALTH SYSTEMS

1.3. Types of distributed systems • Distributed Pervasive Systems Example 1: ELECTRONIC HEALTH SYSTEMS

1.3. Types of distributed systems • Distributed Pervasive Systems • How do we (dynamically) set up an efficient tree in a sensor network? • How does aggregation of results take place? • Can it be controlled? • What happens when network links fail? Example 2: SENSOR NETWORKS

1.3. Types of distributed systems • Distributed Pervasive Systems Example 2: SENSOR NETWORKS

Chapter 2: Architectures

Chapter 2: Architectures • 2.1. Architectural styles • 2.2. System architectures • 2.3. Archictectures versus middleware

2.1. Architectural styles • An architectural style consists of • Software components: modular – can be replaced • Connectors: mediate comm./coord./coop. • Four major architectural styles: • Layered architectures • Object-based architectures • Data-centered architectures • Communication between processes through a common repository (such as a distributed file system) • Event-based architectures (publish/subscribe) • Shared data spaces (combination of event-based and data-centered)

2.1. Architectural styles EX: client/server model EX: networking community

2.1. Architectural styles Both components do not to be active simultaneously  Decoupling in time Propagation of events (only processes that subscribe to events will handle them)  Referential decoupling

2.2. System architectures • Types of system architectures • Centralized architectures • Decentralized architectures • Hybrid architectures

2.2.1. Centralized architectures • Simple client server system • LAN: connectionless • In case of reliable network • If requests are idempotent • WAN: connections  TCP/IP

2.2.1. Centralized architectures • Application Layering The simplified organization of an Internet search engine into three different layers.

2.2.1. Centralized architectures • Multitiered Architectures • Two tier architectures • A client machine containing only the programs implementing (part of) the user-interface level • A server machine containing the rest(the programs implementing the processing and data level) • Three tier architectures

2.2.1. Centralized architectures • Two tier Architectures Correctness checks in forms at client side Banking application Thin clients/terminals Remote data repres. Local caching Fat clients

2.2.1. Centralized architectures • Three tier Architectures

2.2.2. Decentr. architectures • Vertical distribution (previous slides) • 1 machine per layer (or couple of layers) • Centralized architectures • Horizontal distribution (next slides) • distributing clients and servers • Peer-to-peer systems • Peers act as client and server  servents • Decentralized architectures • Concept over overlay network • Refers to communication links

2.2.2. Decentr. architectures • Structured peer-to-peer systems • Basics: distributed hash table • Large identifier space (128 bit or 160 bit) • Each node gets an identifier • Each data item gets an identifier • LOOKUP(data_item) returns network_address_of_node • Nodes keep list of id’s of other nodes • Example 1: Chord system • LOOKUP(k) returns network_address(succ(k)) • Easy membership management • Example 2: Content Addressable Network (CAN) • d-dimensional cartesian coordinate space • Easy node join – difficult node exit

2.2.2. Decentr. architectures • Chord System

2.2.2. Decentr. architectures • Content Addressable Network (CAN)

2.2.2. Decentr. architectures • Unstructured peer-to-peer systems • Each node keeps a list of c neighbours • Partial view on the network • Each node will update partial view using 2 threads: • Active thread • Passive thread • Exchange c/2+1 entries • [address, age] • store newly receive entries • Based on age and randomness • Data randomly stored in nodes • Flooding algoritm required when requesting an item

2.2.2. Decentr. architectures • Unstructured peer-to-peer systems

Distributed Systems