sbd usability evaluation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SBD: Usability Evaluation PowerPoint Presentation
Download Presentation
SBD: Usability Evaluation

Loading in 2 Seconds...

play fullscreen
1 / 42

SBD: Usability Evaluation - PowerPoint PPT Presentation


  • 213 Views
  • Uploaded on

SBD: Usability Evaluation. Chris North cs3724: HCI. ANALYZE. analysis of stakeholders, field studies. claims about current practice. Problem scenarios. Scenario-Based Design. DESIGN. Activity scenarios. metaphors, information technology, HCI theory, guidelines. iterative

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SBD: Usability Evaluation' - sirvat


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sbd usability evaluation

SBD:Usability Evaluation

Chris North

cs3724: HCI

slide2

ANALYZE

analysis of

stakeholders,

field studies

claims about current

practice

Problem scenarios

Scenario-Based Design

DESIGN

Activity

scenarios

metaphors,

information

technology,

HCI theory,

guidelines

iterative

analysis of

usability

claims and

re-design

Information scenarios

Interaction scenarios

PROTOTYPE & EVALUATE

summative

evaluation

formative

evaluation

Usability specifications

evaluation
Evaluation
  • Formative vs. Summative
  • Analytic vs. Emprical
usability engineering
Usability Engineering

Reqs Analysis

Design

Evaluate

Develop

many iterations

usability engineering5
Usability Engineering

Formative evaluation

Summative evaluation

usability evaluation
Usability Evaluation
  • Analytic Methods:
      • Usability inspection, Expert review
      • Heuristic Evaluation
      • Cognitive walk-through
      • GOMS analysis
  • Empirical Methods:
      • Usability Testing
        • Field or lab
        • Observation, problem identification
      • Controlled Experiment
        • Formal controlled scientific experiment
        • Comparisons, statistical analysis
user interface metrics
User Interface Metrics
  • Ease of learning
      • learning time, …
  • Ease of use
      • perf time, error rates…
  • User satisfaction
      • surveys…

Not “user friendly”

usability testing9
Usability Testing
  • Formative: helps guide design
  • Early in design process
      • when architecture is finalized, then its too late!
  • A few users
  • Usability problems, incidents
  • Qualitative feedback from users
  • Quantitative usability specification
usability test setup
Usability Test Setup
  • Set of benchmark tasks
      • Easy to hard, specific to open-ended
      • Coverage of different UI features
      • E.g. “find the 5 most expensive houses for sale”
      • Different types: learnability vs. performance
  • Consent forms
      • Not needed unless video-taping user’s face (new rule)
  • Experimenters:
      • Facilitator: instructs user
      • Observers: take notes, collect data, video tape screen
      • Executor: run the prototype if faked
  • Users
      • 3-5 users, quality not quantity
usability test procedure
Usability Test Procedure
  • Goal: mimic real life
      • Do not cheat by showing them how to use the UI!
  • Initial instructions
      • “We are evaluating the system, not you.”
  • Repeat:
      • Give user a task
      • Ask user to “think aloud”
      • Observe, note mistakes and problems
      • Avoid interfering, hint only if completely stuck
  • Interview
      • Verbal feedback
      • Questionnaire
  • ~1 hour / user
usability lab
Usability Lab
  • E.g. McBryde 102
slide14
Data
  • Note taking
      • E.g. “&%$#@ user keeps clicking on the wrong button…”
  • Verbal protocol: think aloud
      • E.g. user thinks that button does something else…
  • Rough quantitative measures
      • HCI metrics: e.g. task completion time, ..
  • Interview feedback and surveys
  • Video-tape screen & mouse
  • Eye tracking, biometrics?
analyze
Analyze
  • Initial reaction:
      • “stupid user!”, “that’s developer X’s fault!”, “this sucks”
  • Mature reaction:
      • “how can we redesign UI to solve that usability problem?”
      • the user is always right
  • Identify usability problems
      • Learning issues: e.g. can’t figure out or didn’t notice feature
      • Performance issues: e.g. arduous, tiring to solve tasks
      • Subjective issues: e.g. annoying, ugly
  • Problem severity: critical vs. minor
cost importance analysis
Cost-Importance Analysis
  • Importance 1-5: (task effect, frequency)
      • 5 = critical, major impact on user, frequent occurance
      • 3 = user can complete task, but with difficulty
      • 1 = minor problem, small speed bump, infrequent
  • Ratio = importance / cost
      • Sort by this
      • 3 categories: Must fix, next version, ignored
refine ui
Refine UI
  • Simple solutions vs. major redesigns
  • Solve problems in order of: importance/cost
  • Example:
      • Problem: user didn’t know he could zoom in to see more…
      • Potential solutions:
        • Better zoom button icon, tooltip
        • Add a zoom bar slider (like moosburg)
        • Icons for different zoom levels: boundaries, roads, buildings
        • NOT: more “help” documentation!!! You can do better.
  • Iterate
      • Test, refine, test, refine, test, refine, …
      • Until? Meets usability specification
project usability evaluation
Project: Usability Evaluation
  • Usability Evaluation:
      • >=3 users: Not (tainted) HCI students
      • Simple data collection (Biometrics optional!)
      • Exploit this opportunity to improve your design
  • Report:
      • Procedure (users, tasks, specs, data collection)
      • Usability problems identified, specs not met
      • Design modifications
usability test vs controlled expm
Usability test vs. Controlled Expm.
  • Usability test:
      • Formative: helps guide design
      • Single UI, early in design process
      • Few users
      • Usability problems, incidents
      • Qualitative feedback from users
  • Controlled experiment:
      • Summative: measure final result
      • Compare multiple UIs
      • Many users, strict protocol
      • Independent & dependent variables
      • Quantitative results, statistical significance
what is science
What is Science?
  • Measurement
  • Modeling
scientific method
Scientific Method
  • Form Hypothesis
  • Collect data
  • Analyze
  • Accept/reject hypothesis
  • How to “prove” a hypothesis in science?
      • Easier to disprove things, by counterexample
      • Null hypothesis = opposite of hypothesis
      • Disprove null hypothesis
      • Hence, hypothesis is proved
empirical experiment
Empirical Experiment
  • Typical question:
      • Which visualization is better in which situations?

Spotfire vs. TableLens

cause and effect
Cause and Effect
  • Goal: determine “cause and effect”
      • Cause = visualization tool (Spotfire vs. TableLens)
      • Effect = user performance time on task T
  • Procedure:
      • Vary cause
      • Measure effect
  • Problem: random variation
      • Cause = vis tool OR random variation?

random variation

Realworld

Collecteddata

uncertain conclusions

stats to the rescue
Stats to the Rescue
  • Goal:
      • Measured effect unlikely to result by random variation
  • Hypothesis:
      • Cause = visualization tool (e.g. Spotfire ≠ TableLens)
  • Null hypothesis:
      • Visualization tool has no effect (e.g. Spotfire = TableLens)
      • Hence: Cause = random variation
  • Stats:
      • If null hypothesis true, then measured effect occurs with probability < 5% (e.g. measured effect >> random variation)
  • Hence:
      • Null hypothesis unlikely to be true
      • Hence, hypothesis likely to be true
variables
Variables
  • Independent Variables (what you vary), and treatments (the variable values):
      • Visualization tool
          • Spotfire, TableLens, Excel
      • Task type
          • Find, count, pattern, compare
      • Data size (# of items)
          • 100, 1000, 1000000
  • Dependent Variables (what you measure)
      • User performance time
      • Errors
      • Subjective satisfaction (survey)
      • HCI metrics
example 2 x 3 design
Example: 2 x 3 design
  • n users per cell

Ind Var 2: Task Type

Ind Var 1: Vis. Tool

Measured user performance times (dep var)

groups
Groups
  • “Between subjects” variable
      • 1 group of users for each variable treatment
      • Group 1: 20 users, Spotfire
      • Group 2: 20 users, TableLens
      • Total: 40 users, 20 per cell
  • “With-in subjects” (repeated) variable
      • All users perform all treatments
      • Counter-balancing order effect
      • Group 1: 20 users, Spotfire then TableLens
      • Group 2: 20 users, TableLens then Spotfire
      • Total: 40 users, 40 per cell
issues
Issues
  • Eliminate or measure extraneous factors
  • Randomized
  • Fairness
      • Identical procedures, …
  • Bias
  • User privacy, data security
  • IRB (internal review board)
procedure
Procedure
  • For each user:
      • Sign legal forms
      • Pre-Survey: demographics
      • Instructions
          • Do not reveal true purpose of experiment
      • Training runs
      • Actual runs
          • Give task
          • measure performance
      • Post-Survey: subjective measures
  • * n users
slide31
Data
  • Measured dependent variables
  • Spreadsheet:
step 1 visualize it
Step 1: Visualize it
  • Dig out interesting facts
  • Qualitative conclusions
  • Guide stats
  • Guide future experiments
step 2 stats
Step 2: Stats

Ind Var 2: Task Type

Ind Var 1: Vis. Tool

Average user performance times (dep var)

tablelens better than spotfire
TableLens better than Spotfire?
  • Problem with Averages: lossy
      • Compares only 2 numbers
      • What about the 40 data values? (Show me the data!)

Avg Perf time

(secs)

Spotfire TableLens

the real picture
The real picture
  • Need stats that compare all data

Avg Perf time

(secs)

Spotfire TableLens

statistics
Statistics
  • t-test
      • Compares 1 dep var on 2 treatments of 1 ind var
  • ANOVA: Analysis of Variance
      • Compares 1 dep var on n treatments of m ind vars
  • Result:
      • p = probability that difference between treatments is random (null hypothesis)
      • “statistical significance” level
      • typical cut-off: p < 0.05
      • Hypothesis confidence = 1 - p
p 0 05
p < 0.05
  • Woohoo!
  • Found a “statistically significant” difference
  • Averages determine which is ‘better’
  • Conclusion:
      • Cause = visualization tool (e.g. Spotfire ≠ TableLens)
      • Vis Tool has an effect on user performance for task T …
      • “95% confident that TableLens better than Spotfire …”
      • NOT “TableLens beats Spotfire 95% of time”
      • 5% chance of being wrong!
      • Be careful about generalizing
p 0 0539
p > 0.05
  • Hence, no difference?
      • Vis Tool has no effect on user performance for task T…?
      • Spotfire = TableLens ?
  • NOT!
      • Did not detect a difference, but could still be different
      • Potential real effect did not overcome random variation
      • Provides evidence for Spotfire = TableLens, but not proof
      • Boring, basically found nothing
  • How?
      • Not enough users
      • Need better tasks, data, …
data mountain
Data Mountain
  • Robertson, “Data Mountain” (Microsoft)
data mountain experiment
Data Mountain: Experiment
  • Data Mountain vs. IE favorites
  • 32 subjects
  • Organize 100 pages, then retrieve based on cues
  • Indep. Vars:
      • UI: Data mountain (old, new), IE
      • Cue: Title, Summary, Thumbnail, all 3
  • Dependent variables:
      • User performance time
      • Error rates: wrong pages, failed to find in 2 min
      • Subjective ratings
data mountain results
Data Mountain: Results
  • Spatial Memory!
  • Limited scalability?