Distributed System Structures

Distributed System Structures • Introduction • Design Goals • Distributed Operating Systems • Network Operating Systems • Middleware-Based Systems • Client-Server Model • Peer-to-Peer Computing Model • Communication Protocols • Sockets • Remote Procedure/Method Calls GMU – CS 571

Distributed Systems • A distributed system is a collection of loosely coupled processors interconnected by a communication network. • Implications: • No shared physical memory • Communication/coordination through message passing • No global clock • Difficulty of keeping track of global state with accuracy • Independent failure considerations GMU – CS 571

A Distributed System GMU – CS 571

Example Distributed System: Internet intranet % % ISP % % backbone satellite link desktop computer: server: network link: GMU – CS 571

Desktop email server computers print and other servers Local area Web server network email server print File server other servers the rest of the Internet router/firewall Example Distributed System: Intranet • An intranet is a portion of the Internet that is separately administered and has a boundary that can be configured to enforce local security policies. GMU – CS 571

Why build Distributed Systems? • Resource Sharing – expensive/rarely used hardware, large databases • Computation Speedup – divide computation in to tasks that can execute concurrently, can include the ability to use idle cycles elsewhere (SETI@HOME) • Reliability via redundancy • Communication – file transfer, mail, RPC (Remote Procedure Call) GMU – CS 571

Design Goals in Distributed Systems • Overcoming Heterogeneity • Security • Concurrency • Transparency • Failure Handling • Scalability GMU – CS 571

Computers Date Web servers 188 0 1979, Dec. 1989, July 130,000 0 1999, July 56,218,000 5,560,866 2003, Jan. 171,638,297 35,424,956 Scalability • A system is described as scalable if it will remain effective when there is significant increase in the number of users and the number of resources. • Internet provides an illustration of a distributed system for a drastic increase of computers/services. GMU – CS 571

Scalability (Cont.) • If more users or resources need to be supported we are often confronted with limitations of • Centralized services (e.g. a single server for all users) • Centralized data (e.g. a single on-line telephone book) • Centralized algorithms (e.g. doing routing based on complete information) • In decentralized (distributed) algorithms • No machine has complete information about the system state. • Machines make decisions based only on local information. • Failure of one machine does not ruin the algorithm. • There is no implicit assumption that a global clock exists. GMU – CS 571

Design Challenges for Scalability • Avoiding performance bottleneck through • Caching • Replication • Distribution • Use of distributed algorithms GMU – CS 571

Case Study in Scalability: Domain Name System (DNS) • The first component of network communication is the naming (i.e. the way components refer to each other) of the systems in the network. • Identify processes on remote systems by <host-name, identifier> pair. • Need to provide a mechanism to resolve the symbolic host name into a numerical host-id that describes the destination system to the networking hardware. • In Internet, Domain Name System (DNS) specifies the naming structure of the hosts, as well as name-to-address resolution. GMU – CS 571

DNS and Name Resolution • Generally, DNS resolves addresses by examining the host name components in reverse order. • If the host name is flits.cs.vu.nl, then first the name server for the .nl domain will be contacted. • Name resolution may proceed in either iterative fashion, or recursive fashion. • Interative queries expects the best answer the DNS server can provide immediately, without contacting other DNS servers. • Local caches are usually kept at each name server to enhance the performance. GMU – CS 571

DNS: Scaling through distribution GMU – CS 571

OS Structures in Distributed Systems • Operating systems for distributed systems can be roughly divided into two categories • Distributed Operating Systems: The OS essentially tries to maintain a single, global view of the resources it manages (Tightly-coupled operating system) • Network Operating Systems: Collection of independent operating systems augmented by network services (Loosely-coupled operating system) • Modern distributed systems are mostly designed to provide a level of transparency between these two extremes, through the use of middleware. GMU – CS 571

Distributed Operating Systems • Full transparency, users are not aware of the multiplicity of machines. • Access to remote services similar to access to local resources. GMU – CS 571

Distributed Operating Systems (Cont.) • Each node has its own kernel for managing local resources (memory, local CPU, disk, …). • The only means of communication among nodes is through message passing. • Above each kernel is a common layer of software that implements the OS supporting parallel and concurrent execution of various tasks. • This layer may even provide a complete software implementation of shared memory (distributed shared memory) • Additional facilities may include, task assignments to processors, masking hardware failures, transparent storage, general interprocess communication, or data/computation/process migration. GMU – CS 571

Network Operating Systems • NOS does not try to provide a single view of the distributed system. • Users are aware of the multiplicity of the machines. GMU – CS 571

Network Operating Systems (Cont.) • NOS provide facilities to allow users to make use of services in other machines • Remote login (telnet, rlogin) • File transfer (ftp) • Users need to explicitly log on into remote machines, or copy files from one machine to another. • Need multiple passwords, multiple access permissions. • In contrast, adding or removing a machine is relatively simple. GMU – CS 571

Middleware-Based Systems • Achieving full and efficient transparency with distributed operated systems is a major task • On the other hand, a higher level of abstraction is highly desired on top of network operating systems. GMU – CS 571

Middleware-Based Systems (Cont.) • Each local system forming part of the underlying NOS provides local resource management in addition to simple communication. • The concept of middleware was introduced due to the integration problems of various networked applications (distributed transactions and advanced communication facilities). • Example middlewares: Remote Procedure Calls, Remote Method Invocations, Distributed File Systems, Distributed Object Systems (CORBA) GMU – CS 571

Client-Server Model • How to organize processes in a distributed environment? • Thinking in terms of clients that request services from servers helps understanding and managing the complexity. GMU – CS 571

Client-Server Model (Cont.) • Servers may in turn be clients of other servers: vertical distribution (example: web crawlers at a search engine) • Services may be also implemented as several server processes in separate host computers interacting as necessary to provide a service to client processes: horizontal distribution • The servers may partition the set of objects on which the service is based and distribute them between themselves. • Replication may be used to increase performance, availability and to improve fault tolerance. GMU – CS 571

Client-Server Model (Cont.) An example of horizontal distribution of a Web Service GMU – CS 571

Peer-to-peer (P2P) systems • As an alternative to the client-server model, interacting processes may act cooperatively as peers to perform a distributed activity or computation • Example: distributed ‘whiteboard’ application allowing users on several computers to view and interactively modify a picture that is shared between them • Middleware layers will perform event notification and group communication. • P2P networks gained popularity in the late 90s with file-sharing services (e.g. Napster, Gnutella) GMU – CS 571

P2P Systems (cont.) Peer 2 Peer 1 Application Application Peer 3 Sharable objects Application Peer 4 Application Peers 5 .... N GMU – CS 571

Communication Structure The design of a communication network must address four basic issues: • Naming and name resolution - How do two processes locate each other to communicate? • Routing strategies - How are messages sent through the network? • Connection strategies - How do two processes send a sequence of messages? • Contention - The network is a shared resource, so how do we resolve conflicting demands for its use? GMU – CS 571

Communication Protocols • The systems on a network must agree on a concrete set of rules and formats before undertaking a communication session. • The rules are formalized in what are called protocols. Ex: FTP, HTTP, SMTP, telnet, … • The definition of a protocol contains • A specification of the sequence of messages that must be exchanged • A specification of the format in the data in the messages GMU – CS 571

OSI Protocol Model • The International Standards Organization (ISO) developed a reference model identifying the various levels involved, and pointing out which level performs which task (Open Systems Interconnection Reference Model – OSI model). GMU – CS 571

OSI Protocol Model • Each layer provides service to the one above it through a well-defined interface. • On the sending side, each layer adds a header to the message passed by the layer above and passes it down to the layer below. • On the receiving side, the message is passed upward, with each layer stripping off and examining its own header. GMU – CS 571

Physical LayerMechanical and electrical network-interface connections – implemented in the hardware; defines the means of transmitting raw bits rather than logical data packets. Data Link LayerFraming, error detection and recovery; node-to-node (hop-to-hop) frame delivery on the same link. Network LayerProviding host-to-host connections, routing packets (routers work at this layer); responsible for source to destination packet delivery including routing through intermediate hosts Transport LayerEnd-to-end connection management, message partitioning into packets, packet ordering, flow and error control Layers in OSI Protocol Model GMU – CS 571

Layers in OSI Protocol Model (Cont.) • Session LayerDialog and synchronization control for application entities (remote login, ftp, …); opening, closing, and managing a session between end-user application processes; a session is a dialogue or meeting between two or more communicating devices, or between a computer and user (see Login session) • Presentation LayerData representation transformations to accommodate heterogeneity, encryption/decryption; responsible for the delivery and formatting of information to the application layer; It relieves the application layer of concern regarding syntactical differences in data representation within the end-user systems. • Application LayerProtocols designed for specific requirements of different applications, often defining interfaces to services GMU – CS 571

TCP/IP Protocols • Dominant “Internetworking” protocol suite used in Internet. • Fewer layers than ISO model, combines multiple functions at each layer  High efficiency (but more difficult to implement) • Many application services and application-level protocols exist for TCP/IP, including the Web (HTTP), email (SMTP, POP), netnews (NNTP), file transfer (FTP) and Telnet. GMU – CS 571

ISO vs. TCP/IP Protocol Stacks GMU – CS 571

IP Layer • Performs the routing function • Provides datagram packet delivery service • No set-up is required • Packets belonging to the same message may follow different paths • Packets can be lost, duplicated, delayed or delivered out of order • The IP layer • puts IP datagrams into network packets suitable for transmission in the underlying networks • may need to break the datagram into smaller packets • Every IP packet contains the full network address of the source and destination hosts. GMU – CS 571

IP Address Structure GMU – CS 571

TCP and UDP (Transport Layer) • Whereas IP supports communication between pairs of computers (identified by their IP addresses), TCP and UDP, as transport protocols, provide process-to-process communication. • Port numbers are used for addressing messages to processes within a particular computer. • UDP is almost a transport-level replica of IP. • A UDP datagram • is encapsulated inside an IP packet • includes a short header indicating the source and destination port numbers, a length field and a checksum GMU – CS 571

TCP and UDP (Transport Layer) • UDP provides “connectionless” service • no need for initial connection establishment • no guarantee for reliable delivery is provided • TCP is “connection-oriented” • TCP layer software provides delivery guarantee for all the data presented by the sending process, in the correct order. • Before any data is transmitted, the sending and receiving processes must co-operate to establish a bi-directional communication channel. GMU – CS 571

agreed port any port socket socket message client server other ports Internet address = 138.37.94.248 Internet address = 138.37.88.249 Sockets • A socket is an endpoint for communication made up of an IP address concatenated with a port number. • A pair of processes communicating over a network employ a pair of sockets. • The server waits for incoming client requests by listening to a specified port. Once a request is received, the server accepts a connection from the client socket to complete the connection. GMU – CS 571

a host-local, application-created/owned, OS-controlled interface (a “door”) into which application process can both send and receive messages to/from another (remote or local) application process socket Socket programming Goal: learn how to build client/server application that communicate using sockets Socket API • introduced in BSD4.1 UNIX • explicitly created, used, released by apps • client/server paradigm • two types of transport service via socket API: • unreliable datagram (UDP) • reliable, byte stream-oriented (TCP) GMU – CS 571

Port Numbers • Servers implementing specific services listen to well-known ports (All ports below 1024 are considered well-known). • Telnet server: port 23 • FTP server: port 21 • HTTP server: port 80 • When a client process initiates a request for a connection, it is assigned a port by the host computer. • Berkeley Sockets Interface and X/Open Transport Interface are well-known socket implementations. GMU – CS 571

1. Telnet to your favorite Web server: Trying out http (client side) for yourself Opens TCP connection to port 80 (default http server port) at www.eurecom.fr. Anything typed in sent to port 80 at www.eurecom.fr telnet www.eurecom.fr 80 2. Type in a GET http request: By typing this in (hit carriage return twice), you send this minimal (but complete) GET request to http server GET /~ross/index.html HTTP/1.0 3. Look at response message sent by http server! GMU – CS 571

TCP/IP Sockets • Sockets may use TCP or UDP protocol when connecting hosts in the Internet. • TCP requires a connection establishment phase and provides guaranteed delivery. • UDP does not require a connection set-up phase, however provides only a best-effort delivery service. • Communication primitives are slightly different in two cases. GMU – CS 571

The programmer's conceptual view of a TCP/IP Internet GMU – CS 571

Client must contact server server process must first be running server must have created socket (door) that welcomes client’s contact Client contacts server by: creating client-local TCP socket specifying IP address, port number of server process When client creates socket: client TCP establishes connection to server TCP When contacted by client, server TCP creates new socket for server process to communicate with client allows server to talk with multiple clients TCP provides reliable, in-order transfer of bytes (“pipe”) between client and server application viewpoint Socket programming with TCP GMU – CS 571

Client-Server Communication with Sockets (TCP) GMU – CS 571

Berkeley Sockets API Socket primitives for TCP/IP. GMU – CS 571

/* A simple server in the internet domain using TCP*/ #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> int main(int argc, char *argv[]) { int sockfd, newsockfd, portno, clilen, n; char buffer[256]; struct sockaddr_in serv_addr, cli_addr; sockfd = socket(AF_INET, SOCK_STREAM, 0); // open a socket if (sockfd < 0) error("ERROR opening socket"); bzero((char *) &serv_addr, sizeof(serv_addr)); // place sizeof(serv_addr) 0-bytes in the area pointed by serv_addr portno = 6789; serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = INADDR_ANY; serv_addr.sin_port = htons(portno);//converts to network byte order if (bind(sockfd, (struct sockaddr *) &serv_addr,sizeof(serv_addr)) < 0) error("ERROR on binding"); listen(sockfd,5); clilen = sizeof(cli_addr); newsockfd = accept(sockfd, (struct sockaddr *) &cli_addr, &clilen); if (newsockfd < 0) error("ERROR on accept"); bzero(buffer,256); n = read(newsockfd,buffer,255); if (n < 0) error("ERROR reading from socket"); printf("Here is the message: %s\n",buffer); n = write(newsockfd,"I got your message",18); if (n < 0) error("ERROR writing to socket"); return 0; } C server (TCP) Create socket at port 6789 Wait, on welcoming socket for contact by client Read/Write line from/to socket GMU – CS 571

Java server (TCP) import java.io.*; import java.net.*; class TCPServer { public static void main(String argv[]) throws Exception { String clientSentence; String capitalizedSentence; ServerSocket welcomeSocket = new ServerSocket(6789); while(true) { Socket connectionSocket = welcomeSocket.accept(); BufferedReader inFromClient = new BufferedReader(new InputStreamReader(connectionSocket.getInputStream())); DataOutputStream outToClient = new DataOutputStream(connectionSocket.getOutputStream()); clientSentence = inFromClient.readLine(); capitalizedSentence = clientSentence.toUpperCase() + '\n'; outToClient.writeBytes(capitalizedSentence); } } } Create socket at port 6789 Wait, on welcoming socket for contact by client Create input/output streams, attached to socket Read/Write line from/to socket GMU – CS 571

#include <arpa/inet.h> #include <netdb.h> int main(int argc, char *argv[]) { int sockfd, portno, n; struct sockaddr_in serv_addr; struct hostent *server; char buffer[256]; portno = atoi(argv[2]); sockfd = socket(AF_INET, SOCK_STREAM, 0); if (sockfd < 0) error("ERROR opening socket"); server = gethostbyname(argv[1]); if (server == NULL) { fprintf(stderr,"ERROR, no such host\n"); exit(0); } bzero((char *) &serv_addr, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; bcopy((char *)server->h_addr, (char *)&serv_addr.sin_addr.s_addr,server->h_length); //bcopy(s1, s2, n) function shall copy n bytes from s1 to s2. serv_addr.sin_port = htons(portno); if (connect(sockfd,&serv_addr,sizeof(serv_addr)) < 0) error("ERROR connecting"); printf("Please enter the message: "); bzero(buffer,256); fgets(buffer,255,stdin); n = write(sockfd,buffer,strlen(buffer)); if (n < 0) error("ERROR writing to socket"); bzero(buffer,256); n = read(sockfd,buffer,255); if (n < 0) error("ERROR reading from socket"); printf("%s\n",buffer); return 0; } C client (TCP) Create client socket, connect to server Send line to server Read line from server GMU – CS 571 Read in line from socket

Distributed System Structures