50 likes | 125 Views
Discover how to define success in RADS and establish benchmarks for evaluating progress and success, considering aspects like user utility, adaptability, and cost. Learn about proposed approaches, evaluation processes, possible metrics, and PhD theses focusing on sophisticated end-user modeling and administrative complexity measurement.
E N D
How do we define success?(or, benchmarks and metrics for RADS) Downstairs Group: Amr Awadallah, Aaron Brown, Arnold de Leon, Archana Ganapathi, Kim Keeton, Matthew Merzbacher, Divya Ramachandran, Wei Xu
Approaching the problem • Yardstick for evaluating progress/success • standard: a predefined target that must be met (w/cost) • benchmark: a variable scale of “goodness” • What aspects to measure? • utility of system to end user • adaptability • cost: capital cost, TCO, administrative cost, cost to end users • value: how does RADS improve value to service providers and end users? • Proposed approach: vectors and weighting functions • collect vector of metrics: components of end-user & admin utility • if mapping available, weight components to compute value according to perspective of interest
Evaluation Process • Define raw metrics • aspects of end-user utility • Define mapping to value • weights for reducing utility vector • standards: sets of values representing targets • Create the evaluation environment • requires a specific application context • Develop a perturbation set • define bottom-up: what can go wrong? • categories: failures, security, workload, human, configuration • Apply perturbation set repeatedly • measure initial behavior and adaptability/learning curve • Evaluate management interaction
Some Possible Metrics • End-user • response time, throughput (& stats, histograms) • action-weighted goodput (Gaw) • correctness (relative to gold standard + consistency model) • completeness • restartability/preservation of context across failure • web-specific metrics: user-initiated aborts, clickthrough rate, coverage, abandonment rate • many of these require sophisticated user models & workloads • Operations • complexity of interaction (goal-based?) • transparency of controls and system behavior • validation against human factors design principles
PhD Theses* (*or chapters, at least) • Modeling sophisticated end-users for workload generation • how do users respond to failure events? adapt goals upon unexpected events? • how to abstract into a workload generator? • Measuring administrative complexity and transparency • comprehensibility/predictability of automatic recovery actions • complexity, transparency of manual processes and controls • cleanliness of mental models • Multi-level benchmarking • extrapolating component-level benchmark results to system • Value and cost modeling • Design and implementation of evaluation framework • Validating the benchmark • is it representative of the real world? • assessing resilience of benchmark metrics to changes in environment and perturbations