User Interface Evaluation

User Interface Evaluation Dewan Tanvir Ahmed, PhD University of Ottawa dahmed@site.uottawa.ca

Outline • Objectives of User Interface Evaluation • Evaluation Methods • A Preliminary Case Study: Hotel Reservations • Overview of Interface Evaluation Methods • Videotaped Evaluation • Experiments • Cognitive Walkthroughs • Summary Dewan Tanvir Ahmed

Objectives of User Interface Evaluation • Gould and Lewis, 1985 – • The goal of the evaluation is to provide feedback in software development thus supporting an iterative development process. • Key objective of both UI design and evaluation: Minimize malfunctions • Key reasons for focusing on evaluation: • Without evaluation the designer would be working “blindfold” • Designers wouldn’t really know whether they are solving customer’s problems in the most productive way Dewan Tanvir Ahmed

Objectives of User Interface Evaluation (cont’d) Questions answered by various evaluation techniques: • What is the user’s real task? – Understanding Real World • Prevent later malfunctions • by doing evaluation as part of requirements analysis • Inappropriate tasks/requirements are a major source of malfunctions which can be detected • What problems do or might users experience with the UI? • Directly find malfunctions • Which of several alternative UI’s is better? – Comparing designs • Pick the version that leads to fewer malfunctions • Has the UI met usability targets? – Engineering toward a target • Ensure that malfunction counts are sufficiently low • Product is as good as that of • Competitors • The older ones • Does the UI conform to standards? • Leverage of collective wisdom to reduce malfunctions Dewan Tanvir Ahmed

Objectives of User Interface Evaluation (cont’d) • But, in order to give feedback to designers... ..we must understand why a malfunction occurs • Malfunction analysis: • Determine why a malfunction occurs • Determine how to eliminate malfunctions Dewan Tanvir Ahmed

From http://www.usability.uk.com/images/cartoons/cart5.htm Dewan Tanvir Ahmed

When to Evaluate? • A product is evaluated during its entire life cycle. Dewan Tanvir Ahmed

Evaluation Methods • Formative evaluation: • When designing and maintaining software that we are developing • conducted during the development of a product in order to form or influence design decisions. • Summative evaluation: • When judging a finished product developed by someone else • conducted after the product is finished to ensure that it posses • certain quality, • meets certain standards or • satisfies certain requirements set by the sponsors or other agencies Dewan Tanvir Ahmed

A Preliminary Case Study: Hotel Reservations • UI Evaluation for Forte Travelodge Performed in a special usability lab • Aims: • Identify and eliminate malfunctions • Hence make system easier to use • Avoid business difficulties caused by these malfunctions • Develop improved training material and documentation • prevent potential malfunctions by teaching users how to avoid them • Setup of IBM usability lab: • Resembles TV studio • Microphones and video equipment • One way mirror • Technicians, observers sit on one side • Users sit on other side in realistic environment • User environment resembles reception desk • Non-threatening Dewan Tanvir Ahmed

A Preliminary Case Study: Hotel Reservations (cont’d) • Aspects of system to be evaluated: • How quickly can a booking be made? • (while operator is on telephone) • Is each screen productive to use? • Are help and error messages effective? • Can non-computer-literate operators use the system? • Is complexity minimized? • Is training and documentation effective? Dewan Tanvir Ahmed

A Preliminary Case Study: Hotel Reservations (cont’d) • Procedure: • 15 common task scenarios developed: • Among others: basic registration, cancellation, request for specific room, extension of existing stay etc. • Four days of testing with multiple users performing various sets of tasks • Users were told evaluation is of system, not them • All actions were recorded • Debriefing sessions held • Videos then analyzed for malfunctions • 62 identified • Priorities: • Navigation speed needs improvement • Screen titles and formats need tuning • Hard to refer to documentation • Physical difficulties with telephone headsets and furniture Dewan Tanvir Ahmed

A Preliminary Case Study: Hotel Reservations (cont’d) • Results: • Higher productivity of booking staff • tasks completed more quickly • guest requirements better met • Training costs kept low • Morale kept high • More customers booked by phone • 14500 27000 per week Dewan Tanvir Ahmed

Overview of Interface Evaluation Methods • Three types of methods • Passive evaluation • E.g. logs • Active evaluation • E.g. Experiments • Predictive evaluation / usability inspections • E.g. Heuristics • All types of methods useful for optimal results • Used in parallel • All attempt to prevent malfunctions • Before trying methods, do pilot studies1 first 1A pilot experiment, also called a pilot study, is a small scale preliminary study conducted before the main research, in order to check the feasibility or to improve the design of the research. Dewan Tanvir Ahmed

Passive Evaluation • Usage of software is monitored • Performed while prototyping, in alpha test and later • Does not actively seek malfunctions • only finds them when they happen to occur infrequent (but possibly severe) malfunctions • Generally requires realistic use of a system • Users become frustrated with malfunctions Dewan Tanvir Ahmed

Passive Evaluation – Gathering Information • Problem report monitoring • Users should have an easy way to register their frustration / suggestions • Best if integrated with software • Automatic software logs • Can gather much data about usage • command frequency • error frequency and pre-error patterns • undone operations (a sign of malfunctions) • Privacy is a concern • System must be designed for testability (DFT) • Logs can be taken of: • just keystrokes, mouse clicks • full details of interaction • The latter make accurate playback easier Dewan Tanvir Ahmed

Passive Evaluation – Gathering Information (cont’d) • Questionnaires / surveys • Useful to obtain statistical data from large numbers of users • Proper statistical means are needed to analyze results • Gathers subjective data about importance of malfunction • automated logs omit importance • less frequent malfunctions may be more important • users can prioritize needed improvements • Limit on number of questions • Very hard to phrase questions well • Questions can be closed- or open-ended Dewan Tanvir Ahmed

Active Evaluation • Actively study specific activities performed by users • Performed when prototyping and later Gathering Information: • Experiments & usability engineering • Prove hypotheses about measurable attributes of one or more UI’s • e.g. speed/learning/accuracy/frustration • In usability engineering test against preset targets • Can be expensive • Knowledge of statistics needed • Hard to control for all variables • Observation sessions • Also called ‘interpretive evaluation’ • Simple observation or cooperative evaluation • Described in detail later Dewan Tanvir Ahmed

Predictive Evaluation • Points • Possibly studies of system by experts rather than users • Performed when UI is specified and later • useful even before prototype developed • Can eliminate many malfunctions before users ever see software • Also called ‘usability inspections’ • Examples • Heuristic approach • Cognitive walkthrough Dewan Tanvir Ahmed

Predictive Evaluation Gathering Information • Heuristic evaluation • An informal usability inspection • It is based on a UI design principle document • Analyze whether each guideline is adhered to in the context of the task and users • Can also look at adherence to standards • Advantages • Low cost • Shallow learning curve • Generate effective evaluation without professionals • Cognitive walkthroughs • Step-by-step analysis of: • steps in task being performed • goals users form to perform these tasks • how system leads user through tasks Dewan Tanvir Ahmed

Summary of Evaluation Techniques Dewan Tanvir Ahmed

Videotaped Evaluation • A software engineer approach studies users who are actively using the user interface • The sessions are videotaped • To observe what problems they have • Rather than to measure numbers • Can be done in user’s environment • Activities of the user: • Performs pre-defined tasks • With or without detailed instructions on how to perform them • Preferably talks to herself as if she is alone in a room • Yields - ‘think-aloud protocol’ • This process is called ‘co-operative’ evaluation when the software engineering and user talk to each other Dewan Tanvir Ahmed

Videotaped Evaluation (cont’d) • The importance of video: • Without it, ‘you see what you want to see’ • You interpret what you see based on your mental model • In the ‘heat of the moment’ you miss many things • Minor details (e.g. body language) captured • You can repeatedly analyze, looking for different problems • Tips for using video: • Several cameras are useful • Software is available to help analyze video by dividing into segments and labeling the segments • Evaluation can be time consuming so plan it carefully Dewan Tanvir Ahmed

Steps for Videotaped Evaluation • Select 6 to 8 representative users per user class • E.g. client, salesperson, manager, accounts receivable • Invite them to individual sessions • Sessions should last 30-90 minutes • Schedule 4-6 per day • If system involves user's clients in the interaction: • Have users bring important clients • or have staff pretend to be clients • Select facilitators/observers and note takers Dewan Tanvir Ahmed

Steps for Videotaped Evaluation (cont’d) • Prepare tasks: • Select the most commonly used tasks plus a few less important tasks • Write task instructions for users • Estimate the time it will take to complete each task plus extra time for discussion • Prepare notebook or form for organizing notes • Set up and test equipment • Hardware on which to run system • Audio or video recorder • Software logs • Do a dry run (pilot study)! Dewan Tanvir Ahmed

Steps for Videotaped Evaluation (cont’d) • At the Start of an Observation Session • Sign informed consent form: • very important • Explain: • nature of project • anticipated user contributions • why user's views are important • focus is on evaluating the user interface, not evaluating the user • all notes, logs, etc., are confidential • user can withdraw at any time • usage of devices • relax! • Start user verbalizing as they perform each task (thinking aloud) • For co-operative evaluation, software engineer also verbalizes • Appropriate questions to be posed by the observing software engineer: Dewan Tanvir Ahmed

Steps for Videotaped Evaluation (cont’d) Dewan Tanvir Ahmed

Steps for Videotaped Evaluation (cont’d) • Hold a wrap-up interview (de-briefing) • What were the most significant problems? • What was the most difficult to learn? • Etc. • Analyze the videotape to find malfunctions • Lab exercise: • Videotaped evaluation of a software product Dewan Tanvir Ahmed

Experiments 1. Pick a set of subjects (users) • A good mix to avoid biases • A sufficient number to get statistical significance • Avoid random happenings effect results 2. Pick variables to test • Independent: Manipulated to produce different conditions • Should not have too many • They should not affect each other too much • Make sure there are no hidden variables • Dependent: Measured value affected by independent 3. Develop a hypothesis • A prediction of the outcome • Aim of experiment is to show this is correct • E.g. Some change in an independent variable causes some change in a dependent variable Dewan Tanvir Ahmed

Experiments (Cont’d) 4. Design experiments to test hypotheses • Create a null (inverse) hypothesis • The null hypothesis typically corresponds to a general or default position. • In most legal systems, the presumption that a defendant is innocent . • Change in independent variable causes no change in dependent variable • Disprove null hypothesis! 5. Conduct experiments 6. Statistically analyze results to draw conclusions • e.g. using ‘t-tests’ • conclusions will be correct within a margin of error 19 times out of 20 • Example: The margin of error for a sample of 400 is approximately plus or minus five percentage points, 19 times out of 20 7. Decide what action to take based on conclusions Def.: A t-test is a statistical tool used to determine whether a significant difference exists between the means of two distributions or the mean of one distribution and a target value. Dewan Tanvir Ahmed

Example: Text Selection Schemes • Early GUI research at Xerox on the Star Workstation • Traditional experiments • Results were used to develop Macintosh • Goal of study: Evaluate how to select text using the mouse Dewan Tanvir Ahmed

Example: Text Selection Schemes (cont’d) Steps: 1. Subjects • Six groups of four • In each group, only two are experienced in mouse usage 2. Variables • Independent • Selection schemes • 6 strategically chosen patterns involving • Which mouse button (if any) could be double/triple/quad clicked to select character/word/sentence • Which mouse button could be dragged through text • Which mouse button could adjust the start/end of a selection • Dependent • Selection time • Selection errors 3. Hypothesis • Some scheme is better than all others Dewan Tanvir Ahmed

Example: Text Selection Schemes (cont’d) 4. Detailed experiment design • Null hypothesis: No difference in schemes • Assign a selection scheme to each group • Train the group in their scheme • Measure task time and errors • Each subject repeated 6 times • A total of 24 tests per scheme 5. Conduct Experiment 6. Analysis • t-test used - scheme F found to be significantly better • Point and draw through with left mouse • Adjust with middle mouse 7. Action • Try another combination similar to scheme F • Left mouse can be double-clicked Dewan Tanvir Ahmed

Cognitive Walkthroughs • A form of predictive evaluation • Can you count number of windows in your grandparent’s home? • Detailed reviews based on psychological theory: • A new user must form goals to execute a task • How well the system leads the user to form those goals • i.e. how well the system supports the user • Points to note • The method is highly structured • Forms are provided to guide the evaluator • More time consuming than ordinary heuristic evaluation • Less time consuming than experiments Dewan Tanvir Ahmed

Cognitive Walkthrough - Steps • The following steps are practiced in cognitive walkthroughs: • The characteristics of general users are identified and sample tasks are developed that encompasses on the aspects of the design to be evaluated. • Designer and expert evaluators come together and do the analysis. • Cognitive walkthrough happens for each task considering the context of a typical scenario, and they try to answer the following questions: • The correct action is satisfactorily evident to the user? • The user notice the correct action? • Can user see button or menu? • Whether the user interpret the response correctly? Dewan Tanvir Ahmed

Cognitive Walkthrough – steps (cont’d) • After walkthrough • Identify what causes these problems. • Why users face difficulties • So design changes may be made. • The design is then revised to solve the problems presented. Dewan Tanvir Ahmed

Example: Cognitive Walkthrough Steps • Choose a task to evaluate • Describe the task exactly • First describe the task in one sentence • Use simple language • The wording should be from a first-time user’s point of view e.g. Record a newly-received item in inventory. • Describe the initial state of the system e.g. Main menu is displayed • List the atomic actions needed to correctly perform the task, e.g. • Click on ‘add to inventory’ in the menu. • If you don’t know the part number, hit ‘return’ to perform look up the part number, then go to action 4. • Type the part number into the ‘part number’ field • Press tab • Type the number of items in the ‘Number’ field • Hit <return> or click on ‘Add’. • If the system prints out a bar-code sticker, affix it to the new item. Dewan Tanvir Ahmed

Example: Cognitive Walkthrough Steps (cont’d) • Describe classes of users who may perform the task e.g. Receiver - knows about inventory, but not yet about the system • Describe the ‘Goal Structure’ (or task structure) - users would likely have in their minds before starting the task • High-level and system independent • Indent sub-goals/subtasks • Note if there are actions for which the user has no goals, the system must stimulate the user to think of these goals by the time they must perform the task • If different classes of user may have different goal structures, list these too. e.g. Record a received item in inventory Started the inventory program Enter the item Dewan Tanvir Ahmed

Example: Cognitive Walkthrough Steps (cont’d) • For each action specified in step 2c, do the following (I to IV): • Write down the goal structure ... that the user would need to have in order to perform the action correctly e.g. For action 4 • Record a received item in inventory • Record the number of items • Press tab • Enter the number • Cause the system to process the transaction Dewan Tanvir Ahmed

Example: Cognitive Walkthrough Steps (cont’d) • Verify that the user will have the correct goal structure • Given their initial goals • Given the system’s response to the previous action • Estimate the percentage of users who might have each of the following possible problems: • Failure to add goals • e.g. For action 2The system must make it clearly visible that pressing return with nothing entered will invoke a lookup mechanism • Failure to drop goals • e.g. The user may have a goal to, notify the person who ordered the parts This would not be needed if the system performs this automatically • Addition of spurious goals • e.g. There may be a field marked ‘Description’However this only needs to be filled in if the type of item is not in the database Dewan Tanvir Ahmed

Example: Cognitive Walkthrough Steps (cont’d) • No-progress impasse • e.g. After adding an item, the system might just clear the screen ready for another entry. • The user may think the transaction failed (i.e. goal not achieved) • Premature loss of goals • e.g. The user enters an item and hits “return” • A message ‘transaction accepted’ is printed (meaning the transaction has been started) • The user powers off the computer thinking the goal is reached • The system never got around to printing the label Dewan Tanvir Ahmed

Example: Cognitive Walkthrough Steps (cont’d) • Verify that the actions match the goalsPossible problems: • Correct action doesn’t match goal • e.g. User wants to delete an item that was stolen. • Correct action is to select ‘add to inventory’ and specify a negative number • System does not help user match the goal to the action • Incorrect actions match goals • e.g. User wants to add a new type of item to inventory (for which no items have yet been received) • Upon seeing ‘add to inventory’, user selects this incorrect menu item • Verify that the user can physically perform the action Possibleproblems: • Physical difficulties • e.g. recognizing an icon, holding down shift-ctrl-alt-a to perform a command • Time-outs • i.e. running out of time – the system gives up Dewan Tanvir Ahmed

Summary • Objective of evaluation: Minimize malfunctions • Key questions: • What is real task? Problems? Which is better? • Met targets? Is it standard? Visibility and feedback • Formative vs. summative evaluation • Passive methods • Problem reporting • Software logging • Questionnaires/surveys Dewan Tanvir Ahmed

Summary (Cont’d) • Active methods • Traditional experiments • Investigate a single UI element • Pick subjects • Independent and dependent variables • Hypotheses • Experimental designs: • independent subject • matched subject (control for differences among subjects) • repeated measures (reuse subjects) • Observation sessions (Videotaped Evaluation) • Study active use on realistic tasks • Think-aloud protocol on video • Co-Operative Evaluation involves dialogue Dewan Tanvir Ahmed

Summary (Cont’d) • Predictive evaluation: involve experts • Cognitive Walkthroughs: goals and actions • Describe task, actions, users, goal structure • For each action, verify that users: ... add and drop goals as needed ... don’t add unneeded goals ... can tell when a goal is reached ... don’t drop needed goals ... can see what action to take ... are not mislead into taking wrong action ... have no physical difficulties with action Dewan Tanvir Ahmed

Thanks! Dewan Tanvir Ahmed

User Interface Evaluation