IEEM 552 - Human-Computer Systems

IEEM 552 - Human-Computer Systems Dr. Vincent Duffy - IEEM Week 11 - Interpreting Experimental Results & Intro to Psychometrics Apr. 20, 1999 http://www-ieem.ust.hk/dfaculty/duffy/552 email: vduffy@ust.hk 1

QOTD 1 What do we hope to improve related to ‘interface design’? • research has shown techniques to improve: • learning time • performance speed • error rates • user satisfaction • all related to usability 2

How to interpret the results related to Chapter 4 Eberts • 4 typical dependent measures • related to performance: • 2 measures related to quantity • speed • frequency of an event (how many ... can be found) • 2 measures related to quality • accuracy or number of errors • preference (particularly for marketing as related to buying decisions) 3

Hypothesis 1 (case 1).Repeated use of the tested interfaces leads to increased performance? • case 1 see data table on the next slide (a modified version of 5.1 p.96 Eberts) • suppose ‘observation’ = subject # observed • QOTD 1b: does this allow you to examine the research question? • no why? • doesn’t test learning of a subject, right? 4

Table for case 1: Numerical example for time (sec.) for a number of users performing a task on each of three interfaces. • Observation Interface A Interface B Interface C • User 1 24** 24 11 • User 2 19 25 14 • User 3 22 21 17 • Mean 21.67 23.33 14.00 • **suppose the mean=24 for user 1 on interface A over n=25 trials 5

Hypothesis 1 (case 2). Repeated use of the tested interfaces leads to increased performance?QOTD 2a: What would you conclude about this? • case 2 see data table on the next slide (a modified version of 5.1 p.96 Eberts) suppose ‘observation’ =test # • or ...number of times test is run on 1 subject) • QOTD 2b: does this data allow answering of the question? • yes. 6

Table for case 2: Numerical example for time (sec.) for performing a task on each of three interfaces. • Observation Interface A Interface B Interface C • 1 24** 24 11 • 2 19 25 14 • 3 22 21 17 • Mean 21.67 23.33 14.00 • **suppose the mean=24s for observation 1 (all users) on interface A with n=50 users 7

Yes. • Suppose we plot the data • see plot on the next slide (a modified version of see figure 5.7 (p.90, Eberts) • what is the result? • QOTD 2acontinued: it depends on which interface is used, right? • learning at a different rate for interface A, but not for interface B 8

Version 1: of Modified figure 5.7. An example of different practice effects determined from a ‘test-retest’ experimental design. The important part of the graph is the interaction; subjects using interface A learn at a different rate than those on interface B (interface C is not included in this graph). Interface A Interface B Time to High complete task Low Test 1 Test 2 Test 3 9

Version 2: of Modified figure 5.7. An example of different practice effects determined from a ‘test-retest’ experimental design. The important part of the graph is the interaction; subjects using interface A learn at a faster rate than those on interface B (interface C is not included in this graph). Interface A Interface B Performance High Low Test 1 Test 2 Test 3 10

Please note the difference • Please note the difference in the previous two plots. 11

suppose Hypothesis 2: Direct manipulation interfaces are better for speed of task completion than command (or dos) based interfaces.(eg. Put file into trashcan. Recall…Miller and Stanney) • What do you look for ? • an effect…..the effect of type of interface • Hypothesis 2. case 1 • see data table on the next slide (a modified version of data table 5.8, p.114 • QOTD 3: does this allow you to examine the research question? 12

Table for hypothesis 2 (case 1):Numerical example for time (sec.) for performing a task on each of three interfaces (blocked by expertise level). • Observation Interface A Interface B Mean • (direct) (command based) • Expert 19** 21 17.00 • Intermediate 22 25 20.33 • Novice 24 24 21.67 • Mean 21.67 23.33 19.67 • **suppose the mean=19s for expert (all users) on interface A with n=100 users 13

suppose Hypothesis 2: Direct manipulation interfaces are better for speed of task completion than command based interfaces. • how to test the data in the plot on the next slide to see if there is a significant difference? (between performance of interace 1-command based and interface 2-direct manipulation)(a modified version of figure 4-2 in Eberts) 14

Modified figure 4-2. In this experiment only one dependent variable is manipulated. The results show a main effect, that one interface is better than the other. What analysis could you use to verify this? All users Performance High Low Command- based Direct Manipulation 15

suppose Hypothesis 2: Direct manipulation interfaces are better for speed of task completion than command based interfaces. • how to test w/the data in the table on the next slide? (a modified version of data in table 5.8) • previously we plotted the data • also quantitatively we used t-test • QOTD 4: can we use t-test now? • Case 1? Case 2? • why not? • more than 2 levels or variables variables • use anova - analysis of variance 16

Table for hypothesis 2 (case 1):Numerical example for time (sec.) for performing a task on each of three interfaces (blocked by expertise level). • Observation Interface A Interface B Mean • (direct) (command based) • Expert 19** 21 17.00 • Intermediate 22 25 20.33 • Novice 24 24 21.67 • Mean 21.67 23.33 19.67 • **suppose the mean=19s for expert (all users) on interface A with n=100 users 17

Table for hypothesis 2 (case 1):Numerical example for time (sec.) for performing a task on each of three interfaces (blocked by expertise level). • Observation Interface A Interface B Interface C Mean • (direct-PC) (command based) (direct-Mac) • Expert 19** 21 11 17.00 • Intermediate 22 25 14 20.33 • Novice 24 24 17 21.67 • Mean 21.67 23.33 14.00 19.67 • **suppose the mean=19s for expert (all users) on interface A with n=100 users 18

For Hypothesis 2 (Direct manipulation interfaces are better for speed of task completion than command based interfaces.)QOTD 5: case 1 How do you interpret this? ANOVA y=performance (speed or accuracy) variable (indep.) F p interface high <.05 user level <.20 This analysis can be used to see if there is an ‘effect’ based on interface or user level Implies relationship between condition (interface) and performance 19

QOTD 6: case 2: How is this different?suppose the ANOVA results show the following. What is the effect? y=performance (speed or accuracy) variable (indep.) F p interface high <.05 user level <.20 interface*userlevel high <.05 Implies performance is related to the condition of combination of two variables (an interaction). 20

QOTD 6: case 2suppose the ANOVA results show the following. What is the effect? y=performance (speed or accuracy) variable (indep.) F p interface high <.05 user level <.20 interface*userlevel high <.05 Can not draw any conclusions about interface by itself (if there is an interaction). 21

case 2 : What if we plot it? Illustrates an interaction - plot of lines not parallel performance o novice x expert x x o o interface type A interface type B 22

interaction no interaction 23

Hypothesis 2 case 3 (similar) suppose the results were as shown in Figure 4-3 • see figure 4-3 (p.71) • QOTD 7: What does this show? • plot with an interaction • learning occurs for novices w/direct manipulation but not for experts • by looking at the plot how can you tell if there is an interaction? • not parallel lines (parallel plot = no interaction) 24

Modified figure 4.3. In this experiment two independent variables are manipulated. The results show an interaction between the two. The results show the the performance is dependent on expertise level. performance o novice x expert x o o x command-based direct manipulation 25

Recall previous case (modified fig. 5.7) • Fig.5.7 -recall what it tells us w/regard to Hypothesis 1? (see figure on next slide) • does repeated use lead to increased performance? • it depends. • the results - learning when using interface A, not for interface B • note there is an interaction there too. • note: plotted lines are not parallel 26

Version 2: of Modified figure 5.7. An example of different practice effects determined from a ‘test-retest’ experimental design. The important part of the graph is the interaction; subjects using interface A learn at a faster rate than those on interface B (interface C is not included in this graph). Interface A Interface B Performance High Low Test 1 Test 2 Test 3 27

after break tonight • how to get the data and analyze • for surveys/questionnaires • purpose and reliability • validity and factor loadings • types of questions • the term psychometrics • testing for user satisfaction and computer experience 28

How to get the data (and analyze)particularly data related to surveys and questionnaires • Chapter 4 Eberts • paper by Miller and Stanney (WCEQ) from Int. J. or Human-Computer Interaction (1997) • reference - Cody and Smith • chapter 10 factor analysis -p.150-162 • chapter 11 psychometrics • reliability using cronbach alpha - p.276-277 29

questionnaires: purpose & reliability • used to capture • subjective behavior such as evaluations, judgments, comparisons, beliefs & opinions • can be assessed for • reliability, validity and factor loadings • reliability - probability question will be answered the same way • 0=no reliability, 1=perfect reliability • usually try to see if there is consistency between items of a measure/variable • usually use Cronbach Alpha for reliability 30

validity and factor loadings • validity • how well does the questionnaire measure what it is supposed to measure (accuracy) • factor loadings • different factors (or groupings of questions within the questionnaire) may be related to different aspects of the system • for example, one factor may be software related while a different one may be hardware related • you check to see how highly correlated the items are with one another - how highly they load on a group (or factor)-using factor analysis 31

types of questions • open ended questions • difficult to assess quantitatively • multiple choice questions • usually user ‘attitudes’ do not have correct or incorrect answers • rating scales • such as likert scales • example- choose 1-5 circle best answer • never seldom sometimes generally always • 1 2 3 4 5 • always have clear centerpoint - an odd number of choices (3,5,7) Typically normally distributed 32

the term psychometrics • is related to the study of ‘development of questionnaires as well as analyses of them • how to score a test, perform item analyses, test reliability, interrater reliability • care must be taken not to give biased questions • for example ‘Experts have suggested...... do you approve of this?’ • in HCI • questionnaires can be used to assess • user satisfaction, computer literacy/expertise 33

for testing user satisfaction • one popular scale • QUIS scale developed by (1987-88) Chin, et. al. • 27 items - 9 point likert scale • assess • overall reaction to software (6 items) • evaluation of the characters on the screen (4 items) • use of terms and information throughout the system(6 items) • learning to operate the system (6 items) • system capabilities such as speed (5 items) 34

Homework • Wk10- Read chapter 4 in Eberts p.61-81for next week. • Wk11- Read Chapter 5 in Eberts p.82-125 (only the parts that were cover in lecture). • Week 12 - Read chapter 10, p.150-162 and Chapter 11, 276-277 in Cody and Smith. • Papers for wk 12 & wk13 - images reserve • Week 12 - Read paper Molnar, K.K. and Kletke, M.G., 1996 • ‘The impacts on user performance and stisfaction of a voice based front-end interface for a standard software tool’,. Int. J. of Human-Computer Studies, 45: 287-303. • Week 13 - Read paper by Chi, C.F. and Lin, F.T., 1998 • ‘A comparison of seven visual fatigue assessment techniques in three data acquisition VDT tasks’, Human Factors, Vol. 40, no.4, 577-590. 35

In the coming weeks • tentatively • Wk10 13 apr - ‘mental workload’ • meet in I.S. lab first • wk11 20 apr - ‘interpreting results’ • wk12 27 apr - Visual C++ demo • wk13 4 may - intro to final project • wk14 11 may - ‘working in groups’ • wk15 18 may - project presentations • wk16 25 may -final exam - rm2302 6:45-9:30 36

IEEM 552 - Human-Computer Systems