System Performance & Scalability

System Performance & Scalability i206 Fall 2010 John Chuang

John Chuang http://bits.blogs.nytimes.com/2007/11/26/yahoos-cybermonday-meltdown/index.html

Computing Trends • Multi-core CPUs • Data centers • Cloud computing • What are the drivers? • scalability, availability, cost-effectiveness John Chuang

Lecture Outline • Performance Metrics • Availability • Queuing theory • M/M/1 queue • Scalability • M/M/m queue John Chuang

What is Performance? • Users want fast response time and high availability • Managers want happy users, and many of them, while minimizing cost • What are standard measures of system performance? John Chuang

Performance Metrics • Response time (seconds) • Throughput (MIPS, Mbps, TPS, ...) • Resource utilization (%) • Availability (%) John Chuang

Availability Availability = MTTF / (MTTF + MTTR) • Mean-time-to-failure (MTTF) • Mean-time-to-recover (MTTR) John Chuang

Network Client Server Formulate request Message latency Queuing time Processing time Message latency Interpret response Response Time Adapted from: David Messerschmitt John Chuang

Queuing Theory 2. Service Time Distribution 6. Service Discipline 1. Arrival Process 4. System Capacity 5. Customer Population 3. Number of Servers John Chuang Source: Raj Jain

Kendall’s Notation (1953) 2. Service Time Distribution 6. Service Discipline 1. Arrival Process • A/B/c/k/N/D • A: arrival process • B: service time distribution • c: number of servers • k: system capacity • N: population size • D: service discipline 4. System Capacity 5. Customer Population 3. Number of Servers M: Markov (exponential, memoryless, random, Poisson) D: deterministic E: Erlang H: hyper-exponential G: general FCFS: first come first served FCLS: first come last served RR: round-robin etc. John Chuang

Example Systems 8 8 • M/M/1/ / /FCFS (simplified as M/M/1) • Markovian (Poisson, memoryless) arrival • Markovian service time • 1 server • Infinite server capacity • Infinite arrival stream • First-come-first-serve discipline • Other examples: • M/M/1/k (finite capacity) • M/M/m (m servers) • G/D/1 (arbitrary arrival, deterministic service time) John Chuang

M/M/1 Queue • Poisson arrival, with average arrival rate of l jobs/sec • Poisson service, with average service rate of m jobs/sec • Single server with infinite queue • System utilization (hopefully < 1): r = l/m • Average number of jobs in system: N =  n·pn = r/(1 - r) • System throughput (if r < 1) : X = l • Average response time (from Little’s Law): R = N/X = 1/(m - l) John Chuang

Example: Web Server • Web server receives 40 requests/second • Web server can process 100 requests/second • What is server utilization? • At any given time, how many requests are at server (waiting plus being processed)? • What is the mean total delay at server (waiting plus processing)? • What happens when traffic rate doubles? John Chuang

Example: Web Server • l = 40 requests/second • m = 100 requests/second • Utilization = r = l/m = 40/100 = 40% • # of requests = N = r/(1 - r) = 0.67 • Average time spent at server = R = N/X = 0.67/40 = 17ms John Chuang

Example: Traffic Doubled • l = 80 requests/second • m = 100 requests/second • Utilization = r = l/m = 80/100 = 80% • # of requests = N = r/(1 - r) = 4 • Average time spent at server = R = N/X = 4/80 = 50ms (more than doubled!) John Chuang

Approaching Congestion • l = 99 requests/second • m = 100 requests/second • Utilization = r = l/m = 99/100 = 99% • # of requests = N = r/(1 - r) = 99 • Average time spent at server = R = N/X = 99/99 = 1 second! John Chuang

Utilization Affects Performance John Chuang

M/M/1/k Queue (Finite Capacity) • r = l/m • N = r/(1-r) – (k+1)rk+1/(1-rk+1) • R = N/X = N/leff • where leff = l(1-Pk) = effective arrival rate • and Pk = rk(1-r)/(1-rk+1) = probability of a full queue • Loss rate = l - leff John Chuang

M/M/1/k Response Time John Chuang

M/M/1/k Throughput John Chuang

Lecture Outline • Performance Metrics • Availability • Queuing theory • M/M/1 queue • Scalability • M/M/m queue John Chuang

Scalability • The capability of a system to increase total throughput under an increased load when resources (typically hardware) are added • Cost of additional resource • Performance degradation under increased load John Chuang

Scalability Example • Original web server: can process m requests/sec; accepts requests at l/sec • Now request rate increases to 10l/sec and web server is swamped (r = 10l/m)! • Need to add new hardware! John Chuang

Which is better? • Option 1: One big web server that can process 10m requests/sec • Option 2: Ten web servers, each can process m requests/sec; each accepts 10% of requests (l/sec per server) • Option 3: Ten web servers, each can process m requests/sec; share single queue (load balancer) that accepts requests at 10l/sec John Chuang

l l l l l l l l l l m m m m m m m m m m 10l m m m m m m m m m m Option 1: M/M/1 queue with big server Option 2: (ten M/M/1 queues) 10l 10m Option 3: M/M/10 queue John Chuang

M/M/m Queue (m Servers) • r = l/mm • N = mr + rf/(1-r) where and John Chuang

Which is Better? m = 10; m = 100; l = 50 Remember: Scalability is not just about performance! John Chuang

System Performance & Scalability