Evaluation

Evaluation a. Why / when b. Evaluation representations and techniques • User based • (expert-)Knowledge-based • Analytisch • Norms and standards • Technisch c. Samenvatting

a. Waarom evalueren en testen? Usability volgens ISO 9241-11 • Effectiveness – does it work for prospective users? • Efficiency – how much (time, effort) does it cost them? • Satisfaction – their subjective reaction Evaluatie verbetert het ontwerp • User-centered: is deze web site nuttig en bruikbaar voor bedoelde gebruikers? • Goedkoopste manier fouten te repareren: hoe eerder hoe beter • Gebruikers en klanten betrekken bevordert acceptatievan het product

Waarom vroeg evalueren en testen? • Kosten van het verbeteren van fouten: Analysis & Design Implementation Maintenance fasen $ 1,000 $ 6,000 $ 60,000 kosten Source: Hawksmere - ISO seminar material

Wanneer evalueren? Discovery Analysis Elaboration Construction Transition Maintenance Target Group Analysis Focus Group Sessions Concept Testing Intermediate Usability Testing Active Usability Testing User Involvement Remote Usability Testing Expert Involvement Expert Review Surveys Continuous Usability Evaluation

Wanneer evalueren? • Vroeg in ontwerpproces: • Conceptueel (doel, taken, soort gebruiker, concept web site, etc.) • Nog geen website-specifieke taken • Later: • Specifieke taken zijn bekend, dus kunnen getest worden • Te laat voor conceptuele fouten

b. Evaluation representations and techniques Evaluation is based on representations (models of the system): Formal representations - to be used by design team • CCT, ETAG, GOMS, NUAN, …. Representations for users, client, and expert colleagues • scenario • simulation and mock-up • interactive prototype

Evaluation in design phases Scenario and simulation: claims analysis prototype: cognitive walk-through prototype and implemented system: • heuristic evaluation • objective observation (usability lab) • subjective usability evaluation • mental representation and activity (hermeneutic techniques) implemented system: standards (ISO), performance measures

Types of evaluation techniques • User-based (gebruiker) • Knowledge-based (ervaring en kennis) • Analytisch (statistische gegevens) • Norms and standards • Technisch (code, implementatie) – hier niet uitgewerkt (“engineering expertise

1. User-based User-centered design: gebruiker betrekken in ontwerp Op verschillende manieren: • Interview (individueel) • Focus groep (8-10 deelnemers) • Observatie (individueel)

1. User-based Wat evalueer je: • Tussenliggende resultaten • Informatie Architectuur (card sorting bijv.) • Wireframes • Grafisch ontwerp • Screenshots • Etc. • Prototype: • Papier • Interactieve mockup (bv. clickable powerpoint) • Werkende web site

1. User-based, voorbeeld: focus groep

1. User-based: observatie Soorten observaties: • Opdrachten met vooraf gekozen taken (in usability laboratorium): +/- Gecontrolleerde omgeving + Specifieke procedure + Makkelijk vast te leggen • Gebruiker voelt zich ‘bekeken’ • Gebruikers geven minder snel op • “Normaal” gebruik (field study)

    Een typisch Usability Lab Observation Room Study Room  AV Mobile devices   Dual display  DigiTV One-way mirror Video camera mounted on ceiling Sound-proof walls

De observatie ruimte

De gebruikers-ruimte

1.User-based observatie Aan hand van voor de gebruiker typische taken (ref. scenario’s en flowcharts) • Kwalitatief: wat voor problemen komt de gebruiker tegen? Verder mening, op- en aanmerkingen. • Is de taak uitvoerbaar? Hoe lang doet de gebruiker er over? • Als het niet in 1x goed gaat, waar gaat de gebruiker dan zoeken? • Welke woorden begrijpt de gebruiker niet? • Welke elementen vallen direct op en welke niet? • Waar klikt de gebruiker op? • Hoe wordt de scroll-balk gebruikt? • Kwantitatief: usability metrics per taak (tijd, aantal fouten, aantal stappen, aantal taken, etc.)

1. User-based: veldstudie Soorten observaties: • Opdrachten met vooraf gekozen taken (in usability laboratorium): +/- Gecontrolleerde omgeving + Specifieke procedure + Makkelijk vast te leggen • Gebruiker voelt zich ‘bekeken’ • Gebruikers geven minder snel op • “Normaal” gebruik (field study) + natuurlijke setting en natuurlijke motivatie +/- Met onvoorziene gebeurtenissen + Vrijer verloop • Moeilijker op te nemen • Weinig ruimte voor observators

Test taken • Taken, dus geen functionaliteiten • GOED: “Waar kun je het nieuwe boek over Harry Potter kopen?” • FOUT: “Zoek in de sectie wetgeving naar de voorwaarden voor huursubsidie in het woningreglement” • Vraag, geen opdracht • Vb.(website): “Hoeveel kost dit product?” Niet: “Vind de productinformatie” • Geef gebruiker vrijheid om taak uit te voeren. • Taken moeten realistisch en typisch zijn (ref. scenario’s) • Taken moeten het product redelijk ‘dekken’ • Verschillende aspecten / onderdelen / functionaliteit • Doorgaans 10 – 15 taken (45 minuten)

2. Knowledge-based evaluatie • Op basis van kennis en ervaring van ontwerpers • Cognitive walkthrough • Heuristische evaluaties • Checklists

Expert evaluation:Cognitive walkthrough Definition: “finding usability problems in a user interface by having a small set of evaluators examine the interface and give an opinion for each step in the dialogue for a selected set of scenarios” Evaluators: user interface specialists, not from the design team

Cognitive walkthrough Specify scenarios for possible problematic interactions, at the level of single user and system actions Ask the evaluator to answer a small set of standard questions for each step Example question set: • what would a normal user do in this situation? • why (based on what information or knowledge)? • what would the user expect the system to do next?

Cognitive walkthrough Problems: • not possible to consider all possible scenarios • no information on recovery of errors • time aspect is not considered Benefits: • very early indications of problems of representation of information and of consistency

Cognitieve walkthrough • Systematische methode voor het doorlopen van de site • Voer typische taak uit op site (of prototype) en kijk of alle bijbehorende stappen door een “gemiddelde” gebruiker zouden kunnen worden uitgevoerd. • Kan worden uitgevoerd door 1 persoon (ontwerper)

Cognitieve walkthrough • Bestaat uit een aantal stappen: • Definieer de doelgroep voor de test • Creëer realistische scenario's • Doorloop de scenario’s met ‘de 4 vragen’ • Analyseer elk scenario en geef ontwerp verbeteringen • Stap 1 en 2 zijn al gedaan in de taakanalyse

Cognitieve walkthrough De vier vragen om elke stap van de scenario’s te analyseren: • Wat wil de gebruiker in deze situatie als volgende stap bereiken? • Wat denkt de gebruiker dat hij nu moet doen? • Waarom denkt de gebruiker dat dit de goede actie is? • Welke systeem reactie verwacht de gebruiker?

Heuristische evaluatie • Heuristiek = vuistregel. • Garanderen in de meeste gevallen basis usability • Aan de hand van bepaalde aspecten en principes: • Bv: functionaliteit, dialoog, representatie, … • Kan worden gedaan door een usability specialist • Kan worden gedaan met een groep • Meerdere mensen zorgen voor aanvullende inzichten

Heuristic Evaluation (Nielsen) • Visibility of system status • The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. • Match between system and the real world • The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. • User control and freedom • Users often choose system functions by mistake and need a clearly marked "emergency exit" to leave unwanted states without having to go through an extended dialogue.

Heuristic Evaluation • Consistency and standards • Users must not wonder whether different words, situations, or actions mean the same thing. • Error prevention • Even better than good error messages is a careful design which prevents a problem from occurring in the first place. • Recognition rather than recall • Make objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate. • Flexibility and efficiency of use • Accelerators - unseen by novices - may speed up interaction for experts so that systems can cater to both inexperienced and experienced users. Let users tailor frequent actions.

Heuristic Evaluation • Help users recognise, diagnose, and recover from errors • Express error messages in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. • Help and documentation • Even though systems are best used without documentation, it may be necessary to provide help. This should not be too large, be easy to search, focused on user tasks, listing concrete steps to be carried out. • Aesthetic and minimalist design • Dialogues should not contain information that is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with relevant units of information and diminishes relative visibility.

Expert evaluation: Heuristic evaluation checklist Roe & Arnold

Heuristic evaluation • Make errors caused by system limitations self-exploratory

Heuristic evaluation

3. Analytische evaluatie Kwantitatief, gebaseerd op cijfers • Questionnaires: • Naar zoveel mogelijk mensen opsturen • Subjectief!! • Hit logs: • Uitgebreide site-meter • Page hits + transfer rates. Welke pagina’s worden het meest bezocht en vanuit waar gaat men waarheen? • Interpretatie is speculatief

Subjective evaluation techniques Not less reliable than objective techniques examples: • SUMI software usability • SMEQ mental effort ESPRIT MUSIC project • ISA mental load instantaneous self assessment

SUMI (licence needed)http://www.megataq.mcg.gla.ac.uk/sumi.html 50 statements on software system • 5 sub-scales • for experienced users, in standard working conditions • diagnosis of usability problems requires at least 10 users sub-scales: • Efficiency; • Affect; • Helpfulness; • Control; • Learnability global score: perceived usability

SUMI Scoring through “stencils” standard scores, based on large samples of industrial product evaluation Reliable interpretation requires a sample of at least 10 users who “know” the product in normal context of use. diagnosed: • < 40 - action needed • > 55 - acceptable software • > 60 - good software for individual users or individual questions, see manual

Analytische evaluatie: SUS System Usability Scale (SUS) – Measuring website usability: Digital Equipment Corporation, 1986 John Brooke: john.brooke@redhatch.co.uk A quick and valid tool, based on ISO 9241-11 and European Community ESPRIT project “MUSiC”

SUS Originally aiming at “software systems” To be used after users got to “know” the system in real life context. Later adapted to websites Validity: Correlates well with well established more time consuming general usability scales (e.g. SUMI)

SUS

SUS Scoring: Items 1, 3, 5, 7, 9: • strongly disagree = 0, etc. till • Strongly agree = 4 Items 2, 4, 6, 8. 10: • strongly disagree = 4, etc., till • Strongly agree = 0 Add scores, multiply total by 2.5: Total score range 0 – 100

SUS Reliability: At least 15 users that have used the website for some realistic tasks in “natural” conditions Will lead to repeatable results

SUS Examples of tasks • Task 1: Your digital camera uses SmartMediacards. Find the least expensive external reader (USB) for your PC that will read them. • Task 2: You do lots of hiking. Find the least expensive personal GPS with map capability and at least 8 MB of memory.

For these websites: Finance.yahoo.com

Evaluation

Evaluation

Presentation Transcript

evaluation

Evaluation

Evaluation

Evaluation

EVALUATION

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

EVALUATION

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation Economic Evaluation

Evaluation

Evaluation