1 / 82

ITCS 6010

ITCS 6010. VUI Evaluation. Summative Evaluation. Evaluation of the interface after it has been developed. Typically performed only once at the end of development. Rarely used in practice. Not very formal. Data is used in the next major release. Formative Evaluation.

Download Presentation

ITCS 6010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITCS 6010 VUI Evaluation

  2. Summative Evaluation • Evaluation of the interface after it has been developed. • Typically performed only once at the end of development. Rarely used in practice. • Not very formal. • Data is used in the next major release.

  3. Formative Evaluation • Evaluation of the interface as it is being developed. • Begins as soon as possible in the development cycle. • Typically, formative evaluation appears as part of prototyping. • Extremely formal and well organized.

  4. Formative Evaluation • Performed several times. • An average of 3 major cycles followed by iterative redesign per version released • First major cycle produces the most data. • Following cycles should produce less data, if you did it right.

  5. Formative Evaluation Data • Objective Data • Directly observed data. • The facts! • Subjective Data • Opinions, generally of the user. • Some times this is a hypothesis that leads to additional experiments.

  6. Formative Evaluation Data • Subjective data is critical for VUIs.

  7. Formative Evaluation Data • Quantitative Data • Numeric • Performance metrics, opinion ratings (Likert Scale) • Statistical analysis • Tells you that something is wrong. • Qualitative Data • Non numeric • User opinions, views or list of problems/observations • Tells you what is wrong.

  8. Formative Evaluation Data • Not all subjective data are qualitative. • Not all objective data are quantitative. • Quantitative Subjective Data • Likert Scale of how a user feels about something. • Qualitative Objective Data • Benchmark task performance measurements where the outcome is the expert’s opinion on how users performed.

  9. Steps in Formative Evaluation • State hypothesis and design the experiment. • Conduct the experiment. • Collect the data. • Analyze the data. • Draw your conclusions & establish hypotheses. • Redesign and do it again.

  10. Experiment Design • Subject selection • Who are your participants? • What are the characteristics of your participants? • What skills must the participants possess? • How many participants do I need (5, 8, 10, …) • Do you need to pay them?

  11. Experiment Design • Task Development • What tasks do you want the subjects to perform using your interface? • What do you want to observe for each task? • What do you think will happen? • Benchmarks? • What determines success or failure?

  12. Experiment Design • Protocol & Procedures • What can you say to the user without contaminating the experiment? • What are all the necessary steps needed to eliminate bias? • You want every subject to undergo the same experiment. • Do you need consent forms (IRB)?

  13. Experiment Trials • Calculate Method Effectiveness • Sears, A., (1997) “Heuristic Walkthroughs: Finding the Problems Without the Noise,” International Journalof Human-Computer Interaction, 9(3), 213-23. • Follow protocol and procedures. • Don’t say “say” in your experiment, this will bias or contaminate your experiment. • Pilot Study • Expect the unexpected.

  14. Experiment Trials • Pilot Study • An initial run of a study (e.g. an experiment, survey, or interview) for the purpose of verifying that the test itself is well-formulated. For instance, a colleague or friend can be asked to participate in a user test to check whether the test script is clear, the tasks are not too simple or too hard, and that the data collected can be meaningfully analyzed. • (see http://www.usabilityfirst.com/ )

  15. Experiment Trials – Pilot Study • Wizard of OZ • You play the “Wizard” or system. • Users call the Wizard and have the Wizard pretend to be the system.

  16. Data Collection • Collect more than enough data. • More is better! • Backup your data. • Secure your data.

  17. Data Analysis • Use more than one method. • All data lead to the same point. • Your different types of data should support each other. • Remember: • Quantitative data tells you something is wrong. • Qualitative data tells you what is wrong. • Experts tell you how to fix it.

  18. Measuring Method Effectiveness

  19. Redesign • Redesign should be supported by data findings. • Setup next experiment. • Sometimes it is best to keep the same experiment. • Sometimes you have to change the experiment. • Is there a flaw in the experiment or the interface?

  20. Formative Evaluation Methods • Usability Inspection Methods • Usability experts are used to inspect your system during formative evaluation. • Usability Testing Methods • Usability tests are conducted with real users under observation by experts. • Usability Inquiry Methods • Usability evaluators collect information about the user’s likes, dislikes and understanding of the interface.

  21. Usability Inspection Methods • Usability experts “inspect” your interfaces during formative evaluation. • Widely used in practice. • Often abused by developers that consider themselves to be usability experts.

  22. Usability Inspection Methods • Heuristic Evaluation • Cognitive Walkthroughs • Pluralistic Walkthroughs • Feature, Consistency & Standards Inspection

  23. Heuristic Evaluation: What is it? • Several evaluators independently evaluate the interface & come up with potential usability problems. • It is important that there be several of these evaluators and that the evaluations be done independently. • Nielsen's experience indicates that around 5 evaluators usually results in about 75% of the overall usability problems being discovered.

  24. Heuristic Evaluation: How can I do it? • Obtain the service of 4, 5 or 6 usability experts. • Each expert will perform an independent evaluation. • Give experts a heuristics inspection guide. • Collect the individual evaluations. • Bring the experts together and do a group heuristic evaluation. (Optional)

  25. Cognitive Walkthroughs: What is it? • Cognitive walkthroughs involve one or a group of evaluators inspecting a user interface by going through a set of tasks and evaluate its understandability and ease of learning. • The input to the walkthrough also include the user profile, especially the users' knowledge of the task domain and of the interface, and the task cases. • Based upon exploratory learning methods. • Exploration of the user interface.

  26. Cognitive Walkthroughs: What is it? • The evaluators may include • Human factors engineers • Software developers • People from marketing • Documentation, etc. • Best used in the design stage of development.

  27. Cognitive Walkthroughs: How can I do it? • During the walkthrough: • Illustrate the task and then ask a user to perform a task. • Accept input from all participants: do not interrupt demo. • After the walkthrough: • Make interface changes. • Plan the next evaluation.

  28. Pluralistic Walkthroughs: What is it? • During the design stage, a group of people: • Users • Developers • Usability Experts • Meet to perform a walkthrough.

  29. Pluralistic Walkthroughs: How can I do it? • The group meets and 1 person acts as coordinator. • A task is presented to the group. • Paper prototypes, screen shots, etc. are presented. • Each participants write down comments on each interface. • After the demo, a discussion will follow.

  30. Feature, Consistency & Standards Inspection: What is it? • Feature, Consistency & Standards are inspected by an expert.

  31. Feature, Consistency & Standards Inspection: How can I do it? • Feature Inspection • The expert is given use cases/scenarios and asked to inspect the system. • Consistency Inspection • The expert is asked to inspect consistency within your application. • Standards Inspection • The expert is asked to inspect standards. • Standards can be in house, government, etc.

  32. Usability Testing Methods • Carrying out experiments to find out specific information about a design and/or product. • Basis comes from experimental psychology. • Uses statistical data methods • Quantitative and Qualitative

  33. Usability Testing Methods • During usability testing, users work on specific tasks using the interface/product and evaluators use the results to evaluate and modify the interface/product. • Widely used in practice, but not appropriately used. • Often abused by developers that consider themselves to be usability experts. • Can be very expensive and time consuming.

  34. Usability Testing Methods • Performance Measurement • Thinking-aloud Protocol • Question-asking Protocol • Coaching Method

  35. Usability Testing Methods • Co-discovery Learning • Teaching Method • Retrospective Testing • Remote Testing

  36. Performance Measurement: What is it? • Used to collect quantitative data. • Typically, you will be looking for benchmark data. • Objectives MUST be quantifiable • 75% of users shall be able to complete the basic task in less than 30 minutes.

  37. Performance Measurement: How can I do it? • Define the goals that you expect users to perform • Quantify the goals • The time users take to complete a specific task. • The Ratio between successful interactions and errors. • The time spent recovering from errors. • The number of user errors. • The number of commands or other features that were never used by the user. • The number of system features the user can remember during a debriefing after the test. • The proportion of users who say that they would prefer using the system over some specified competitor.

  38. Performance Measurement: How can I do it? • Get participants for the experiments • Conduct very controlled experiments • All variables must remain consistent across users • Problem with performance measurement • No qualitative data

  39. Thinking-aloud Protocol: What is it? • Technique where the participant is asked to vocalize his or her thoughts, feelings, and opinions while interacting with the product.

  40. Thinking-aloud Protocol: How can I do it? • Select the participants, who will be involved? • Select the tasks and design scenarios. • Ask the participant to perform a task using the software. • During the task, ask the user to vocalize • Thoughts, opinions, feelings, etc.

  41. Thinking-aloud Protocol • Problem With Thinking-Aloud Protocol • Cognitive Overload • Can you walk & chew gum at the same time? • Asking the participants to do too much.

  42. Question-asking Protocol: What is it? • Similar to Thinking-aloud protocol. • Instead of participant saying what they are thinking, the evaluator prompts the participant with questions while using the system.

  43. Question-asking Protocol: How can I do it? • Select the participants, who will be involved? • Select the tasks and design scenarios. • Ask the participant to perform a task using the software.

  44. Question-asking Protocol: How can I do it? • During the task, ask the user to questions about the product • Thoughts, opinions, feelings, etc. • Problem With Thinking-Aloud Protocol • Cognitive Overload++ • Can you walk, chew gum & talk at the same time? • Asking the participants to do too much. • Added pressure when the evaluator asks questions. • Can be frustrating on novice users.

  45. Coaching Method: What is it? • A system expert sits with the participant and acts as a coach. • Expert answers the participant’s questions. • The evaluator observes their interaction.

  46. Coaching Method: How can I do it? • Select the participants, who will be involved? • Select the tasks and design scenarios. • Ask the participant to perform a task using the software in the presence of a coach/expert.

  47. Coaching Method: How can I do it? • During the task, the user will ask the expert questions about the product. • Problem With Coaching Method • In reality, there will not be a coach present. • This is good for creating a coaching system, but not for evaluating an interface.

  48. Co-Discovery Learning: What is it? • Two test users attempt to perform tasks together while being observed. • They are to help each other in the same manner as they would if they were working together to accomplish a common goal using the product. • They are encouraged to explain what they are thinking about while working on the tasks. • Thinking Aloud, but more natural because of partner.

  49. Co-Discovery Learning: How can I do it? • Select the participants, who will be involved? • Select the tasks and design scenarios. • Ask the participants to perform a task using the software.

  50. Co-Discovery Learning: How can I do it? • During the task, the users will help each other and voice their thoughts by talking to each other. • Problem With Co-Discovery Learning • Neither is an expert • The blind leading the blind.

More Related