1 / 24

Metrics and Techniques for Evaluating the Performability of Internet Services

Metrics and Techniques for Evaluating the Performability of Internet Services. Pete Broadwell pbwell@cs.berkeley.edu. Outline. Introduction to performability Performability metrics for Internet services Throughput-based metrics (Rutgers) Latency-based metrics (ROC)

Rita
Download Presentation

Metrics and Techniques for Evaluating the Performability of Internet Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metrics and Techniques for Evaluating the Performability of Internet Services Pete Broadwell pbwell@cs.berkeley.edu

  2. Outline • Introduction to performability • Performability metrics for Internet services • Throughput-based metrics (Rutgers) • Latency-based metrics (ROC) • Analysis and future directions

  3. Motivation • Goal of ROC project: develop metrics to evaluate new recovery techniques • Problem: concept of availability assumes system is either “up” or “down” at a given time • Availability doesn’t capture system’s capacity to support degraded service • degraded performance during failures • reduced data quality during high load

  4. What is “performability”? • Combination of performance and dependability measures • Classical defn: probabilistic (model-based) measure of a system’s “ability to perform” in the presence of faults1 • Concept from traditional fault-tolerant systems community, ca. 1978 • Has since been applied to other areas, but still not in widespread use 1 J. F. Meyer, Performability Evaluation: Where It Is and What Lies Ahead, 1994

  5. D=number of data disks pi(t)=probability that system is in state i at time t wi(t) =reward (disk I/O operations/sec) m = disk repair rate l = failure rate of a single disk drive Performability Example Discrete-time Markov chain (DTMC) model of a RAID-5 disk array1 1 Hannu H. Kari, Ph.D. Thesis, Helsinki University of Technology, 1997

  6. Performability for Online Services: Rutgers Study • Rich Martin (UCB alum) et al. wanted to quantify tradeoffs between web server designs, using a single metric for both performance and availability • Approach: • Performed fault injection on PRESS, a locality-aware, cluster-based web server • Measured throughput of cluster during simulated faults and normal operation

  7. Degraded Service During a PRESS Component Fault Throughput FAILURE RESET(optional) RECOVER STABILIZE REPAIR(humanoperator) Requests/sec DETECT Time

  8. Normal throughput Average throughput Degraded throughput Calculation of Average Throughput, Given Faults Throughput Requests/sec Time

  9. Performability Performance during faults Behavior of a Performability Metric Effect of improving degraded performance

  10. Performability MTTF MTTF + MTTR Aavailability = Behavior of a Performability Metric Effect of improving component availability (shorter MTTR, longer MTTF) MTTR MTTF

  11. Performability Overall performance (includes normal operation) Behavior of a Performability Metric Effect of improving overall performance Most performability metrics scale linearly as component availability, degraded performance and overall performance increase

  12. Results of Rutgers Study: Design Comparisons

  13. An Alternative Metric: Response Latency • Originally, performability metrics were meant to capture end-user experience1 • Latency better describes the experience of an end user of a web site • response time >8 sec = site abandonment = lost income $$2 • Throughput describes the raw processing ability of a service • best used to quantify expenses 1 J. F. Meyer, Performability Evaluation: Where It Is and What Lies Ahead, 1994 2 Zona Research and Keynote Systems, The Need for Speed II, 2001

  14. Abandonmentregion Annoyanceregion? Effect of Component Failure on Response Latency Responselatency (sec) 8s Time FAILURE REPAIR

  15. Issues With Latency As a Performability Metric • Modeling concerns: • Human element: retries and abandonment • Queuing issues: buffering and timeouts • Unavailability of load balancer due to faults • Burstiness of workload • Latency is more accurately modeled at service, rather than end-to-end1 • Alternate approach: evaluate an existing system 1 M. Merzbacher and D. Patterson, Measuring End-User Availability on the Web: Practical Experience, 2002

  16. Analysis • Queuing behavior may have a significant effect on latency-based performability evaluation • Long component MTTRs = longer waits, lower latency-based score • High performance in normal case = faster queue reduction after repair, higher latency-based score • More study is needed!

  17. Future Work • Further collaboration with Rutgers on collecting new measurements for latency-based performability analysis • Development of more realistic fault and workload models, other performability factors such as data quality • Research into methods for conducting automated performability evaluations of web services

  18. Metrics and Techniques for Evaluating the Performability of Internet Services Pete Broadwell pbwell@cs.berkeley.edu

  19. Back-of-the-Envelope Latency Calculations • Attempted to infer average request latency for PRESS servers from Rutgers data set • Required many simplifying assumptions, relying upon knowledge of PRESS server design • Hoped to expose areas in which throughput- and latency-based performability evaluations differ • Assumptions: • FIFO queuing w/no timeouts, overflows • Independent faults, constant workload (also the case for throughput-based model) • Current models do not capture “completeness” of data returned to user

  20. Comparison ofPerformability Metrics

  21. Rutgers calculations for long-term performability Goal: metric that scales linearly with both - performance (throughput) and - availability [MTTF / (MTTF + MTTR)] Tn = normal throughput for server AI = ideal availability (.99999) Average throughput (AT) =Tn during normal operation + per-component throughput during failure Average availability (AA) = AT / Tn Performability = Tn x [log(AI) / log(AA)]

  22. Results of Rutgers study: performance comparison

  23. Results of Rutgers study: availability comparison

  24. Results of Rutgers study: performability comparison

More Related