1 / 5

How do we define success? (or, benchmarks and metrics for RADS)

How do we define success? (or, benchmarks and metrics for RADS). Downstairs Group: Amr Awadallah, Aaron Brown , Arnold de Leon, Archana Ganapathi, Kim Keeton, Matthew Merzbacher, Divya Ramachandran, Wei Xu. Approaching the problem. Yardstick for evaluating progress/success

thane-snow
Download Presentation

How do we define success? (or, benchmarks and metrics for RADS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How do we define success?(or, benchmarks and metrics for RADS) Downstairs Group: Amr Awadallah, Aaron Brown, Arnold de Leon, Archana Ganapathi, Kim Keeton, Matthew Merzbacher, Divya Ramachandran, Wei Xu

  2. Approaching the problem • Yardstick for evaluating progress/success • standard: a predefined target that must be met (w/cost) • benchmark: a variable scale of “goodness” • What aspects to measure? • utility of system to end user • adaptability • cost: capital cost, TCO, administrative cost, cost to end users • value: how does RADS improve value to service providers and end users? • Proposed approach: vectors and weighting functions • collect vector of metrics: components of end-user & admin utility • if mapping available, weight components to compute value according to perspective of interest

  3. Evaluation Process • Define raw metrics • aspects of end-user utility • Define mapping to value • weights for reducing utility vector • standards: sets of values representing targets • Create the evaluation environment • requires a specific application context • Develop a perturbation set • define bottom-up: what can go wrong? • categories: failures, security, workload, human, configuration • Apply perturbation set repeatedly • measure initial behavior and adaptability/learning curve • Evaluate management interaction

  4. Some Possible Metrics • End-user • response time, throughput (& stats, histograms) • action-weighted goodput (Gaw) • correctness (relative to gold standard + consistency model) • completeness • restartability/preservation of context across failure • web-specific metrics: user-initiated aborts, clickthrough rate, coverage, abandonment rate • many of these require sophisticated user models & workloads • Operations • complexity of interaction (goal-based?) • transparency of controls and system behavior • validation against human factors design principles

  5. PhD Theses* (*or chapters, at least) • Modeling sophisticated end-users for workload generation • how do users respond to failure events? adapt goals upon unexpected events? • how to abstract into a workload generator? • Measuring administrative complexity and transparency • comprehensibility/predictability of automatic recovery actions • complexity, transparency of manual processes and controls • cleanliness of mental models • Multi-level benchmarking • extrapolating component-level benchmark results to system • Value and cost modeling • Design and implementation of evaluation framework • Validating the benchmark • is it representative of the real world? • assessing resilience of benchmark metrics to changes in environment and perturbations

More Related