Distributed Systems

Distributed Systems Lecture 1 Introduction to distributed systems

Distributed systems • “A collection of (probably heterogeneous) automata whose distribution is transparent to the user so that the system appears as one local machine. This is in contrast to a network, where the user is aware that there are several machines, and their location, storage replication, load balancing and functionality is not transparent. Distributed systems usually use some kind of client-server organization.” – FOLDOC • “A Distributed System comprises several single components on different computers, which normally do not operate using shared memory and as a consequence communicate via the exchange of messages. The various components involved cooperate to achieve a common objective such as the performing of a business process.” – Schill & Springer • Main characteristics • Components: • Multiple spatially separated individual components • Components posses own memory • Cooperation towards a common objective • Resources: • Access to common resources (e.g., databases, file systems) • Communication: • Communication via messages • Infrastructure: • Heterogeneous hardware infrastructure & software middleware

Distributed systems • These definitions do not define the insides of a distributed system • Design and implementation • Maintenance • Algorithmics (i.e., protocols) Facebook social network graph among humans. The Internet color coded by ISPs.

A working definition • “A distributed system (DS) is a collection of entities, each of which is autonomous, programmable, asynchronous and failure-prone, and which communicate through an unreliable communication medium.” • Terms • Entity = process on a device (PC, server, tablet, smartphone) • Communication medium = wired or wireless network • Course objective: • Design and implementation of distributed systems Source: https://courses.engr.illinois.edu/cs425/fa2013/lectures.html

The datacenter • The datacenter lies at the foundation of many DSs • Amazon Web Services, Google Cloud, Microsoft Azure • However, DSs can be comprised of PCs too. • P2P file sharing systems (e.g., Gnutella) Facebook’s Forest City Datacenter.

Example – Gnutella P2P • What are the entities and communicationmedium?

Example – web domains • What are the entities and communication medium?

The Internet • Used by many distributed systems • Vast collection of heterogeneous computer networks • ISPs – companies that provide services for accessing and using the Internet • Intranets – subnetworks operated by companies and organizations • Offer services unavailable to the public from the Internet • Can be the ISP’s core routers • Linked by backbones • High bandwidth network links

Example - Intranet • What are the entities and communicationmedium?

Parallel vs. distributed computing • Parallelism • Perform multiple tasks at the same time • True parallelism requires distribution on multiple processors/cores/machines • Can range from many core to multi processor to many computer on shared or distributed memory • Concurrency • Computations with multiple threads • Can exploit hardware parallelism but it is inherently related to the software need (i.e., react to different asynchronous events) • Concurrency becomes parallelism if parallelism is true (one thread per processor/core/machine) not virtual • Distributedcomputing • Related to where the computation physically resides • Distributed algorithm is executed on multiple CPUs, connected by networks, buses or any other data communication channel • Computers are connected by communication links on distributed memories • Rely fundamentally on message passing • Usually part of the goal • If resources are geographically spread than the system is inherently distributed

Parallel vs. distributed computing • Is distributed computing a subset of parallel computing? • Not an easy answer • In favor • Distributed computing is parallel computing on geographically spread machines • distributed  parallel  concurrent computing • Against • They address different issues • Distributed computing is focused on issues related to computation and data distribution • Parallel computing does not address problems such as partial failures • Parallel computing focuses on tightly coupled applications

Parallel vs. distributed systems Source: courses.washington.edu/css434/slides/w03w04/Fundamentals.ppt

Reasons for DS • Inherently distributed applications • Distributed DB, worldwide airline reservation, banking system • Information sharing among distributed users • CSCW or groupware • Resource sharing • Sharing DB/expensive hardware and controlling remote lab. devices • Better cost-performance ratio / Performance • Emergence of Gbit network and high-speed/cheap MPUs • Effective for coarse-grained or embarrassingly parallel applications • MapReduce • Reliability • Non-stopping (availability) and voting features. • Scalability • Loosely coupled connection and hot plug-in • Flexibility • Reconfigure the system to meet users’ requirements

DS layered architecture Application layer protocol smtp [RFC 821] telnet [RFC 854] http [RFC 2068] ftp [RFC 959] proprietary (e.g. RealNetworks) NFS proprietary (e.g., Skype) Underlying transport protocol TCP TCP TCP TCP TCP or UDP TCP or UDP typically UDP Source: https://courses.engr.illinois.edu/cs425/fa2013/lectures.html Application e-mail remote terminal access Web file transfer streaming multimedia remote file server Internet telephony Implemented via network “sockets”. Basic primitive that allows machines to send messages to each other TCP=Transmission Control Protocol UDP=User Datagram Protocol Distributed System Protocols! Networking Protocols

Main issues of DS • No global clock • No single global notion of the correct time (asynchrony) • Unpredictable failures of components • Lack of response may be due to either failure of a network component, network path being down, or a computer crash (failure-prone, unreliable) • Highly variable bandwidth • From 16Kbps (slow modems or Google Balloon) to Gbps (Internet2) to Tbps (in between DCs of same big company) • Large and variable latency • Few ms to several seconds • Large numbers of hosts • Up to several million • Security and privacy • Due to geographical and political spread • Interoperability • Due to various standards and protocols

DS design goals • Heterogeneity – can the system handle a large variety of types of hardware and software (interoperability)? • Robustness – is the system resilient to hardware and software crashes and failures, and to network dropping messages? • Availability – are data & services always available to clients? • Transparency – can the system hide its internal workings from users? • Concurrency – can the server handle multiple clients simultaneously? • Efficiency – is the service fast enough? Does it utilize 100% of all resources? • Scalability – can it handle 100 million nodes without degrading service? (nodes=clients and/or servers) How about 6 B? More? • Security – can the system withstand hacker attacks? • Privacy – is the user data safely stored? • Openness – is the system extensible?

History of distributed computing

DS system models • Minicomputer model • Workstation model • Workstation-server model • Processor-pool model • Cluster model • Grid computing

Mini- computer Mini- computer Mini- computer Minicomputer Model • Extension of time sharing system • User must log on his/her home minicomputer • Thereafter, he/she can log on a remote machine by telnet • Resource sharing • Database • High-performance devices ARPA net

Workstation Model • Process migration • Users first log on his/her personal workstation • If there are idle remote workstations, a heavy job may migrate to one of them • Problems: • How to find am idle workstation • How to migrate a job • What if a user log on the remote machine Workstation Workstation Workstation 100Gbps LAN Workstation Workstation

Workstation-Server Model • Client workstations • Diskless • Graphic/interactive applications processed in local • All file, print, http and even cycle computation requests are sent to servers • Server minicomputers • Each minicomputer is dedicated to one or more different types of services • Client-Server model of communication • RPC (Remote Procedure Call) • RMI (Remote Method Invocation) • A client process calls a server process’ function • No process migration invoked • Example: NSF Workstation Workstation Workstation 100Gbps LAN Mini- Computer file server Mini- Computer http server Mini- Computer cycle server

Processor-Pool Model • Clients • They log in one of terminals (diskless workstations or X terminals) • All services are dispatched to servers • Servers • Necessary number of processors are allocated to each user from the pool • Better utilization but less interactivity 100Gbps LAN Server 1 Server N

Cluster Model • Client • Takes a client-server model • Server • Consists of many PC/workstations connected to a high-speed network • Puts more focus on performance: • Serves for requests in parallel Workstation Workstation Workstation 100Gbps LAN http server2 http server N http server1 Slave N Master node Slave 1 Slave 2 1Gbps SAN

Grid Computing • Goal • Collect computing power of supercomputers and clusters sparsely located over the nation and make it available as if it were the electric grid • Distributed supercomputing • Very large problems needing lots of CPU, memory, etc. • High-Throughput computing • Harnessing many idle resources • On-Demand computing • Remote resources integrated with local computation • Data-intensive computing • Using distributed data • Collaborative computing • Support communication among multiple parties Workstation Super- computer High-speed Information high way Mini- computer Cluster Super- computer Cluster Workstation Workstation

Cloud Computing • Goal • On demand virtualized access to hardware infrastructure • “pay per use” model for public clouds • “as a service” paradigm • Several models • Infrastructure as a Service • Clients manage virtualized resources • Amazon EC2, Google Cloud • Platform as a Service • Clients have access to various platform services to develop, run, and manage applications without dealing with the infrastructure • Microsoft Azure • Software as a Service • Clients have access only to specific software tools • GMail, Dropbox • Data as a Service • Clients can access remotely stored data • Amazon Public Data Sets: sciences, economics. • … Workstation Internet Specific services VM VM VM Database Workstation Workstation

What will you learn? • Real distributed systems • Cloud computing • Lectures 2 and 3 • All labs (Google Cloud) • Hadoop • MapReduce (lecture 2) • Key-value stores • Lab 5 • Apache Storm • Lecture 14 • Labs 7 • P2P systems • Lecture 10 • Classical problems • Failure detection (lecture 4) • Time and synchronization (lecture 5) • Global states (lecture 6) • Multicast (lecture 7) • Leader election (lecture 9) • Networking and routing (lecture 11) • Gossiping (lecture 13) • Concurrency • RPCandWeb Services(lecture 8) • Replication control (lecture 12)

Grading • Scientific paper analysis (20%) • Students will have to pick a paper from a top conference (published in the last 3 years) and present it • Labassigments (80%) • Assignments given during lab hours • Documentation • Lecture slides, references inside the slides, Googlecloud, ScienceDirect, IEEE Explore, ResearchGate.

Next lecture • Introduction to cloud computing

Distributed Systems