1 / 27

Jeffrey Heer · 28 April 2009

Evaluation Methods. Jeffrey Heer · 28 April 2009. Project Abstracts. For final version (due online Fri 5/1 @ 7am)

leighn
Download Presentation

Jeffrey Heer · 28 April 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation Methods Jeffrey Heer · 28 April 2009

  2. Project Abstracts • For final version (due online Fri 5/1 @ 7am) • Flesh out concrete details. What will you build? If running an experiment, what factors will you vary and what will you measure? What are your hypotheses and why? Provide rationale! • Need to add study recruitment plan and related work sections (see http://cs376/project.html). • Iterate more than once! Stop by office hours to discuss.

  3. What is Evaluation? Something you do at the end of a project to show it works… … so you can publish it. Part of the design-build-evaluate iterative design cycle A way a discipline validates the knowledge it creates. … a reason papers get rejected.

  4. Establishing Research Validity “Methods for establishing validity vary depending on the nature of the contribution. They may involve empirical work in the laboratory or the field, the description of rationales for design decisions and approaches, applications of analytical techniques, or ‘proof of concept’ system implementations” - CHI 2007 Website

  5. Evaluation Methods http://www.usabilitynet.org/tools/methods.htm

  6. What to evaluate? Enable previously difficult/impossible tasks Improve task performance or outcome Modify/influence behavior Improve ease-of-use, user satisfaction User experience Sell more widgets What is the motivating research goal?

  7. UbiFit [Consolvo et al]

  8. Momento Momento [Carter et al]

  9. Evaluation Methods in HCI Inspection (Walkthrough) Methods Observation, User Studies Experience Sampling Interviews and Surveys Usage Logging Controlled Experimentation Fieldwork, Ethnography Mixed-Methods Approaches

  10. Proof by Demonstration Prove feasibility by building prototype system Demonstrate that the system enables task Small user study may add little insight

  11. Inspection Methods • Visibility of system status • Match between system and real world • User control and freedom • Consistency and standards • Error prevention • Recognition over recall • Flexibility and efficiency of use • Aesthetic and minimalist design • Help users recognize, diagnose, and recover from errors • Help and documentation Often called “discount usability techniques” “Expert review” of user interface design Heuristic Evaluation (Nielsen, useit.com/papers/heuristic)

  12. How many evaluators?

  13. Usability Testing Observe people interacting with prototype May include: • Providing tasks (e.g., easy, medium, hard) • Talk-aloud protocol (users verbal reports) • Usage logging • Pre/post study surveys • NASA TLX: workload assessment survey • QUIS: user interaction satisfaction

  14. Wizard-of-Oz Techniques

  15. Controlled Experiments What are the important concerns?

  16. Controlled Experiments Measure response of dependent variables tomanipulation of independent variables. Within or between-subjects design Change indep vars within or across subjects Randomization, replication, blocking Learning effects Choice of measure and statistical tests • t-Test, ANOVA, Chi-squared 2, Non-parametric

  17. Experimental Desiderata P-value: probability that results due to chance Type I Error: accept spurious result Bonferroni’s principle: if you run enough significance tests, you’ll eventually get lucky Type II Error: mistakenly reject result Inappropriate measure or test? Statistical vs. practical significance N=1000, p < 0.001, avg dt = 0.12 sec.

  18. Internal Validity Internal validity: is a causal relation between two variables properly demonstrated? Confounds: is there another factor at play? Selection (bias): approp. subject population? Experimenter bias: researcher actions

  19. External Validity External validity: do the results generalize to other situations of populations? Subjects: do subjects’ aptitudes interact with independent variables? Situation: time, location, lighting, duration

  20. Ecological Validity The degree to which the methods, materials and setting of the study approximate the real-life situation under investigation. Flight simulator vs. flying a plane Simulated community activity vs. open web

  21. Next Time… Distributed Cognition The Power of Representation in Things that Make Us Smart, 1993, pp. 43-76. Donald Norman On Distinguishing Pragmatic from Epistemic Action, Cognitive Science, 1994, pp. 513-549, David Kirsh and Paul Maglio

More Related