Introduction

Introduction

Readings • Lessons from Giant-Scale Services, Eric Brewer, IEEE Internet Computing '01.

Data Centers • Companies like Amazon, Google, eBay are running data centers with tens of thousands of machines • Rate of growth is staggering

Data Centers • A data center has a physical structure (racks of machines) and a logical structure (the one we just saw) • Must launch the applications needed on them • Monitor and relaunch if crashes ensue • Poses optimization challenges • We probably will have multiple data centers • Must control the external DNS, tell it how to route • Answer could differ for different clients

A data center has a physical structure characterized by: Workstation-class nodes Nodes connected by dedicated, low-latency network There may be multiple networks Data Centers

IP network Load manager DNS round robin Layer-4/7 switches Servers Web Business logic Data store Internal network Backplane Anatomy of a Cluster-Based Service

Basic Model: Components • Clients e.g, browsers initiate queries to services • IP network: Could be the Internet or a private network; provides access to service • Load Manager: Provides a level of indirection between the service’s external name and the servers’ physical names • Servers: Combining CPU, memory, and disks into an easy-to-replicate unit • Datastore: This is a replicated or partitioned “database” that is spread across the servers’ disks or network-attached storage (e.g., external DBMS or RAID). • Backplane: Handles interserver traffic such as redirecting client queries (optional)

Load Management • Round-robin DNS • Distributes different IP addresses for a single domain name among clients in a rotating fashion • Good balancing • Does not hide inactive servers e.g., • A client with a down node’s address will continue to try to use it until the DNS mapping expiries • Could take several hours for this to happen • Vendors now sell “layer-4” switches to solve the problems with DNS

Load Management • Level 4 Balancer: Understand TCP connections • Differentiation based on port numbers • Web request routed differently than SMTP request • Level 7 Balancer: Understand application level information • Can parse URLs • Look at parameter of GET • The balancers (switches) typically come in pairs so that failover is supported • Used to address single point of failure

High Availability • For high availability we should aim for something similar to the aims of other infrastructures e.g., telephone, rail or water systems • Two possible approaches: • Partitioning • Replication

Partitioning Let’s then partition the data so that groups of servers handle just a part of the inventory (or any other data) Router needs to be able to extract keys from request Hashing is one strategy for doing this Based on the key you then determine the server to handle the request Need for “deep packet inspection” Example: Amazon

Amazon’s Architecture

Partition – Consistent Hashing • Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring”. • Each node in the ring deals with a part of the output range • One can think of each node in the ring as a virtual node. Multiple virtual nodes can be assigned to a physical node • “routers” take request and apply a hashing function to determine where to send the request

Partition vs Replication • The advantage of partitioning is that the load of handling requests is spread out. • This suggests that requests can be processed faster • Partitioning does not provide high availability though. • Replication is needed for this • Systems like Amazon use both replication and partitioning

Replication • Each data item is replicated at N hosts. • Amazon maintains a preference list: The list of nodes that is responsible for storing a particular key. • Each key is assigned a coordinator node which coordinates updates to an object.

Internet-time implies constant change Need acceptable quality Meet target MTBF, low MTTR, no cascading failures Three approaches to managing upgrades Fast reboot: Cluster at a time Minimize yield impact Rolling upgrade: Node at a time Versions must be compatible Big flip: Half the cluster at a time Reserved for complex changes Either way: use staging area, be prepared to revert Online Evolution

Challenges • The preference list (replica group) has membership that dynamically changes • One cause of a changing membership is failure • Failed node may return to the group • How do you make sure that the routing works (don’t want to send to something that is down)? • How do you make sure that each replica has the same version of an object?

Challenges • Changes occur frequently! • In the OSDI paper on Map Reduce (Google), authors comment that during one experiment that involved 2000 nodes, sets of 80 kept dropping out. • What are the monitoring mechanisms needed to ensure that node changes are reported in a timely fashion? • What should the availability be? How fast should it be? All of this has an implication on resource usage.

Challenges • Essentially there are lots issues to be dealt with including: • Data persistence, load balancing, membership, failure detection, failure recovery, replica synchronization, overload handling, request marshalling, request routing, system monitoring and alarming, etc

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction