html5-img
1 / 27

System Performance & Scalability

System Performance & Scalability. i206 Fall 2010 John Chuang. http://bits.blogs.nytimes.com/2007/11/26/yahoos-cybermonday-meltdown/index.html. Computing Trends. Multi-core CPUs Data centers Cloud computing What are the drivers? scalability, availability, cost-effectiveness.

alaric
Download Presentation

System Performance & Scalability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System Performance & Scalability i206 Fall 2010 John Chuang

  2. John Chuang http://bits.blogs.nytimes.com/2007/11/26/yahoos-cybermonday-meltdown/index.html

  3. Computing Trends • Multi-core CPUs • Data centers • Cloud computing • What are the drivers? • scalability, availability, cost-effectiveness John Chuang

  4. Lecture Outline • Performance Metrics • Availability • Queuing theory • M/M/1 queue • Scalability • M/M/m queue John Chuang

  5. What is Performance? • Users want fast response time and high availability • Managers want happy users, and many of them, while minimizing cost • What are standard measures of system performance? John Chuang

  6. Performance Metrics • Response time (seconds) • Throughput (MIPS, Mbps, TPS, ...) • Resource utilization (%) • Availability (%) John Chuang

  7. Availability Availability = MTTF / (MTTF + MTTR) • Mean-time-to-failure (MTTF) • Mean-time-to-recover (MTTR) John Chuang

  8. Network Client Server Formulate request Message latency Queuing time Processing time Message latency Interpret response Response Time Adapted from: David Messerschmitt John Chuang

  9. Queuing Theory 2. Service Time Distribution 6. Service Discipline 1. Arrival Process 4. System Capacity 5. Customer Population 3. Number of Servers John Chuang Source: Raj Jain

  10. Kendall’s Notation (1953) 2. Service Time Distribution 6. Service Discipline 1. Arrival Process • A/B/c/k/N/D • A: arrival process • B: service time distribution • c: number of servers • k: system capacity • N: population size • D: service discipline 4. System Capacity 5. Customer Population 3. Number of Servers M: Markov (exponential, memoryless, random, Poisson) D: deterministic E: Erlang H: hyper-exponential G: general FCFS: first come first served FCLS: first come last served RR: round-robin etc. John Chuang

  11. Example Systems 8 8 • M/M/1/ / /FCFS (simplified as M/M/1) • Markovian (Poisson, memoryless) arrival • Markovian service time • 1 server • Infinite server capacity • Infinite arrival stream • First-come-first-serve discipline • Other examples: • M/M/1/k (finite capacity) • M/M/m (m servers) • G/D/1 (arbitrary arrival, deterministic service time) John Chuang

  12. M/M/1 Queue • Poisson arrival, with average arrival rate of l jobs/sec • Poisson service, with average service rate of m jobs/sec • Single server with infinite queue • System utilization (hopefully < 1): r = l/m • Average number of jobs in system: N =  n·pn = r/(1 - r) • System throughput (if r < 1) : X = l • Average response time (from Little’s Law): R = N/X = 1/(m - l) John Chuang

  13. Example: Web Server • Web server receives 40 requests/second • Web server can process 100 requests/second • What is server utilization? • At any given time, how many requests are at server (waiting plus being processed)? • What is the mean total delay at server (waiting plus processing)? • What happens when traffic rate doubles? John Chuang

  14. Example: Web Server • l = 40 requests/second • m = 100 requests/second • Utilization = r = l/m = 40/100 = 40% • # of requests = N = r/(1 - r) = 0.67 • Average time spent at server = R = N/X = 0.67/40 = 17ms John Chuang

  15. Example: Traffic Doubled • l = 80 requests/second • m = 100 requests/second • Utilization = r = l/m = 80/100 = 80% • # of requests = N = r/(1 - r) = 4 • Average time spent at server = R = N/X = 4/80 = 50ms (more than doubled!) John Chuang

  16. Approaching Congestion • l = 99 requests/second • m = 100 requests/second • Utilization = r = l/m = 99/100 = 99% • # of requests = N = r/(1 - r) = 99 • Average time spent at server = R = N/X = 99/99 = 1 second! John Chuang

  17. Utilization Affects Performance John Chuang

  18. M/M/1/k Queue (Finite Capacity) • r = l/m • N = r/(1-r) – (k+1)rk+1/(1-rk+1) • R = N/X = N/leff • where leff = l(1-Pk) = effective arrival rate • and Pk = rk(1-r)/(1-rk+1) = probability of a full queue • Loss rate = l - leff John Chuang

  19. M/M/1/k Response Time John Chuang

  20. M/M/1/k Throughput John Chuang

  21. Lecture Outline • Performance Metrics • Availability • Queuing theory • M/M/1 queue • Scalability • M/M/m queue John Chuang

  22. Scalability • The capability of a system to increase total throughput under an increased load when resources (typically hardware) are added • Cost of additional resource • Performance degradation under increased load John Chuang

  23. Scalability Example • Original web server: can process m requests/sec; accepts requests at l/sec • Now request rate increases to 10l/sec and web server is swamped (r = 10l/m)! • Need to add new hardware! John Chuang

  24. Which is better? • Option 1: One big web server that can process 10m requests/sec • Option 2: Ten web servers, each can process m requests/sec; each accepts 10% of requests (l/sec per server) • Option 3: Ten web servers, each can process m requests/sec; share single queue (load balancer) that accepts requests at 10l/sec John Chuang

  25. l l l l l l l l l l m m m m m m m m m m 10l m m m m m m m m m m Option 1: M/M/1 queue with big server Option 2: (ten M/M/1 queues) 10l 10m Option 3: M/M/10 queue John Chuang

  26. M/M/m Queue (m Servers) • r = l/mm • N = mr + rf/(1-r) where and John Chuang

  27. Which is Better? m = 10; m = 100; l = 50 Remember: Scalability is not just about performance! John Chuang

More Related