Evaluating Interfaces

Evaluating Interfaces Goals of evaluation Lab versus field based evaluation Evaluation methods Design-oriented Implemented-oriented

The goals of evaluation? • To ensure that the interface behaves as we expect and meets user needs • Assess the extent of its functionality • Assess its impact on the user • Identify specific problems • Assess the ‘usability’ of the interface

Laboratory studies versus field studies • Laboratory studies • The user comes to the evaluator • Well-equipped laboratory may contain sophisticated recording facilities, two-way mirrors, instrumented computers etc. • Can control or deliberately manipulate the context of use • The only option for some dangerous or extreme interfaces • But cannot reproduce the natural working context of a users environment, especially social interaction and contingencies • Difficult to evaluate long-term use

Field studies • The evaluator goes to the user • Captures actual context • Captures real working practice and social interaction • Not possible for some applications • It can also be difficult to capture data • Cannot ‘prove’ specific hypotheses

Iterations Maintain Test Implement Design Requirements Environment USER Computers Task Different kinds are appropriate at different stages of design Early-on: formative evaluation of the design – may only involve designers and other experts Later-on: evaluation of the implementation – detailed, rigorous and with end-user

Evaluation Methods • Design-oriented evaluation methods: • Cognitive walkthrough • Heuristic/expert inspections • Theory and literature review • Implementation-oriented methods: • Observation • Controlled experiments • Query techniques – interviews and surveys

Cognitive Walkthrough • A predictive technique in which designers and possibly experts simulate the user’s problem-solving process at each step of the human-computer dialogue • Originated in ‘code walkthrough’ from software engineering • Used mainly to consider ‘ease of learning’ issues – especially how users might learn by exploring the interface

Cognitive Walkthrough – The Stages • Begins with • A detailed description of the prototype (e.g., menu layouts) • Description of typical tasks the user will perform • A written list of the actions required to complete the tasks with the prototype • An indication of who the users are and what kind of experience and knowledge they may have

For each task, evaluators step through the necessary action sequences, imagining that they are a new user and asking the following questions: • Will the user know what to do next? • Can the user see how to do it? • Will they know that they have done the right thing? • It is vital to document the walkthrough • Who did what and when • Problems that arose and severity ratings • Possible solutions

A short fragment of cognitive walkthrough • Evaluating the interface to a personal desktop photocopier • A design sketch shows a numeric keypad, a "Copy" button, and a push button on the back to turn on the power. • The specification says the machine automatically turns itself off after 5 minutes inactivity. • The task is to copy a single page, and the user could be any office worker. • The actions the user needs to perform are to turn on the power, put the original on the machine, and press the "Copy" button • Now tell a believable story about the user's motivation and interaction at each action … • From Philip Craiger's page at http://istsvr03.unomaha.edu/gui/cognitiv.htm

The user wants to make a copy and knows that the machine has to be turned on. So they push the power button. Then they go on to the next action. But this story isn't very believable. We can agree that the user's general knowledge of office machines will make them think the machine needs to be turned on, just as they will know it should be plugged in. But why shouldn't they assume that the machine is already on? The interface description didn't specify a "power on" indicator. And the user's background knowledge is likely to suggest that the machine is normally on, like it is in most offices. Even if the user figures out that the machine is off, can they find the power switch? It's on the back, and if the machine is on the user's desk, they can't see it without getting up. The switch doesn't have any label, and it's not the kind of switch that usually turns on office equipment (a rocker switch is more common). The conclusion of this single-action story leaves something to be desired as well. Once the button is pushed, how does the user know the machine is on? Does a fan start up that they can hear? If nothing happens, they may decide this isn't the power switch and look for one somewhere else.

Heuristic/Expert Inspections • Experts assess the usability of an interface guided by usability principles and guidelines (heuristics) • Jacob Nielsen suggest that 5 experts may enough to uncover 75% of usability problems • Best suited to early design and when there is some kind of representation of the system – e.g., storyboard • It’s only as good as the experts and you need experts in the problem domain and usability

The Process of Heuristic Expert Inspections • Briefing session • Experts all given identical description of product, its context of use ad goals of evaluation • Evaluation period • Each experts spends several hours independently critiquing the interface • At least two passes through the interface, one for overall appreciation and others for detailed assessment • Debriefing session • Experts meet to compare findings, prioritise problems and propose solutions • They report/present their findings to decision makers and other stakeholders

Theory and literature review • We have seen before that you can apply existing theory to evaluate a design • The Keystroke Level Model • Fit’s law • HCI and experimental psychology already contain a wealth of knowledge about how people interact with computers • Scour the literature (ACM Digital Library, Google, Citeseer and others) • But think carefully about whether the results transfer

Observation • Observe users interacting with the interface – in the laboratory or field • Record interactions using • Pen and paper • Audio • Video • Computer logging • User notebooks and diaries • Think-aloud techniques

Analysis • Illustrative fragments • Detailed transcription and coding • Post-task walkthroughs • Specialised analysis software can replay video along system data and help the analyst synchronise notes and data

Savannah • An educational game for six players at a time • A virtual savannah is overlaid on an empty school playing field

Studying Savannah • Six trials over three days • Two video recordings from the field • Game replay interface

Impala Sequence

The Impala Sequence Revealed • Elsa suddenly stops • Circular formation • Counting aloud • Nala and Elsa cannot see the impala • Replay shows them stopped on edge of locale • GPS drift carries them over the boundary • The boy who passed through attacked first

Controlled Experiments

Query Techniques • Elicit the user’s view of the system • Can address large numbers of users • Key techniques are: • Interviews • surveys • Relatively simple and cheap to administer • But not so detailed or good for exploring alternative designs

DECIDE: A Framework to Guide Evaluation (Preece, et al, 2001) • Determine the overall goals that the evaluation addresses • Explore the specific questions to be answered • Chose the evaluation approach and specific techniques to answer these questions • Identify the practical issues that must be addressed • Decide how to deal with ethical issues • Evaluate, interpret and present the date

Evaluation through questionnaires • A fixed set of written questions usually with written answers • Advantages: • gives the user’s point of view –good for evaluating satisfaction • quick and cost effective to administer and score and so can deal with large numbers of users • user doesn’t have to be present • Disadvantages: • only tells you how the user perceives the system • Not good for some kinds of information • Things that are hard to remember (e.g., times and frequencies) • Things that involve status or are sensitive to disclose • Usually not very detailed • May suffer from bias

Questions • Three types of questions • Factual - ask about observable information • Opinion –what the user thinks about something (outward facing) • Attitudes – how the user feels about something(inward facing). Maybe whether they feel efficient, do they like the system, do they feel in control? • Two general styles of question • Closed – the user chooses from among a set number of options – quick to complete and easy to summarise with statistics • Open – the user gives free-form answers – captures more information – slower to complete and harder to summarise statistically (may require coding) • Most questionnaires mix open and closed questions

strongly agree agree neutral disagree strongly disagree Options for closed questions • Number of options matches number of possible responses • Likert scales capture strength of opinion • odd number of options on a preference scale when there is the possibility of a neutral response • granularity of the scales depends upon respondents’ expertise

Questionnaire Analysis • Closed questions subject to statistics • Graphical representations (bar charts, pie charts) • Averages and measures of spread • Always look at the raw data • You can only make statistical inferences from carefully designed questionnaires • Open questions • Give general sense of feedback • May be coded and then statistically analysed

Deploying questionnaires • Post • Interview • Email • As part of interface • Web – brings important advantages • Ease of deployment • Reliable data collection • Semi-automated analysis

Designing questionnaires • What makes a good or bad questionnaire • Reliability - ability to give the same results when filled out by like-minded people in similar circumstances • Validity - the degree to which the questionnaire is actually measuring or collecting data about what you think it should • Clarity, length and difficulty • Designing a good questionnaire is surprisingly difficult – pilot, pilot, pilot!! • Statistically valid questionnaire design is a very specialised skill – use an existing one

What to ask • Background questions on the users: • Name, age, gender • Experience with computers in general and this kind of interface in specific • Job responsibilities and other relevant information • Availability for further contact such as interview • Interface specific questions

System Usability Scale (SUS) • I think I would like to use this system frequently • I found the system unnecessarily complex • I thought the system was easy to use • I think I would need the support of a technical person to be able to use this system • I found the various functions in this system were well integrated • I thought there was too much inconsistency in this system • I would imagine that most people would learn to use this system very quickly • I found the system very cumbersome to use • I felt very confident using the system • I needed to learn a lot of things before I could get going with this system

Calculating a rating from SUS • For odd numbered questions, score = scale position - 1 • For even numbered questions, score = 5 - scale position • Multiply all scores by 2.5 (so each question counts 10 points) • Final score for an individual = sum of multiplied scores for all questions (out of 100)

Evaluation during active use • System refinement based on experience or in response to changes in users • interviews and focus group discussions • continuous user-performance data logging • frequent and infrequent error messages • analyse sequences of actions to suggest improvements or new actions • BUT respect people’s rights and consult them first! • User feedback mechanisms • on-line forms, email and bulletin boards • workshops and conferences

Choosing participants • Usually we generalise findings based on a sample of participants • Need to select carefully to avoid sampling bias • Watch out for self-selection and selection based on convenience • Random sampling – every member of the target population has an equal chance to be selected • But how to get access? • Often need to advertise • Describe your sample in your write up

How many users? • 5-12 as a rough rule of thumb Nielson and Landauer (1993) www.useit.com/alertbox/20000319.html

Ethical Issues • Explain the purpose of the evaluation to participants, including how their data will be used and stored • Get their consent, preferably in writing. • Get parental consent for kids • Anonymise data: • As stored – use anonymous ids apart from in one name-id mapping table • As reported – in text and also in images • Do not include quotes that reveal identity • Gain approval from your ethics committee and/or professional body

Example consent form “I state that I am {specific requirements} and wish to participate in a study being conducted by {name/s of researchers/ evaluators} at the {organisation name}. The purpose of the study is to {general study aims}. The procedures involve {generally what will happen}. I understand that I will be asked to {specific tasks being given}. I understand that all information collected in the study is confidential, and that my name will not be identified at any time. I understand that I am free to withdraw from participation at any time without penalty” Signature of participant and date

Good practice • Inform users that it is the system under test, not them • Put users at ease • Do not criticise their performance/opinions • Ideally, you should reward or pay participants • May be polite and a good motivator to make results available to participants

Which method to choose • Design or implementation? • Laboratory or field studies? • Subjective or objective? • Qualitative or quantitative? • Performance or satisfaction? • Level of information provided? • Immediacy of response? • Intrusiveness? • Resources and cost?

Evaluating Interfaces