Understanding Hypothesis Testing in Computer Science User Studies

F, t, and pBasic Statistics for Computer Scientists(aka knowing enough to be critical of user studies) April 4, 2002 Benjamin Lok

User Studies • Trying to identify phenomena or trends • Hypothesis • Blood pressure increases with age and weight • Smoking increases risk of cancer • Real objects in VEs improve performance • How might we investigate this?

Variables and Conditions • Hypothesis: Real objects in VEs improve performance • Independent Variable – the variable that is being manipulated by the experimenter (VE type) • Dependent Variable – the variable that is caused by the independent variable. (performance) • Experimental conditions – The level of independent variable in which the situation of interest was created.

Descriptive Statistics • Hypothesis: Real objects in VEs improve performance • null hypothesis - assume real objects in VEs are the SAME as virtual objects in VEs • Innocent until proven guilty • Your job: Prove otherwise! • alternate hypothesis – interacting with real objects is better than interacting with virtual objects

Raw Data • What does the mean tell us? Is that enough?

Small Pattern (seconds) Large Pattern (seconds) Mean S.D. Min Max Mean S.D. Min Max Real Space (n=41) 16.81 6.34 8.77 47.37 37.24 8.99 23.90 57.20 Purely Virtual (n=13) 47.24 10.43 33.85 73.55 116.99 32.25 70.20 192.20 Hybrid (n=13) 31.68 5.65 20.20 39.25 86.83 26.80 56.65 153.85 Vis Faith Hybrid (n=14) 28.88 7.64 20.20 46.00 72.31 16.41 51.60 104.50 Variances • standard deviation – measure of dispersion (square root of the sum of squares divided by N)

Small Pattern (seconds) Large Pattern (seconds) Mean S.D. Min Max Mean S.D. Min Max Real Space (n=41) 16.81 6.34 8.77 47.37 37.24 8.99 23.90 57.20 Purely Virtual (n=13) 47.24 10.43 33.85 73.55 116.99 32.25 70.20 192.20 Hybrid (n=13) 31.68 5.65 20.20 39.25 86.83 26.80 56.65 153.85 Vis Faith Hybrid (n=14) 28.88 7.64 20.20 46.00 72.31 16.41 51.60 104.50 Hypothesis • We assumed the means are “equal” • But are they? Or is the difference due to chance?

T - test • T – test – statistical test used to determine whether two observed means are statistically different

T – test • (rule of thumb) Good values of t > 1.96 • Look at what contributes to t • http://trochim.human.cornell.edu/kb/stat_t.htm

F statistic, p values • F statistic – assesses the extent to which the means of the experimental conditions differ more than would be expected by chance • t is related to F statistic • Look up a table, get the p value. Compare to α • α value – probability of making a Type I error (rejecting null hypothesis when really true) • p value – statistical likelihood of an observed pattern of data, calculated on the basis of the sampling distribution of the statistic. (% chance it was due to chance)

Small Pattern Large Pattern t – test with unequal variance p – value t – test with unequal variance p - value PVE – RSE vs. VFHE – RSE 3.32 0.0026** 4.39 0.00016*** PVE – RSE vs. HE – RSE 2.81 0.0094** 2.45 0.021* VFHE – RSE vs. HE – RSE 1.02 0.32 2.01 0.055+ Let’s look at data

Between Groups Total Sense of Presence Total Sense of Presence Score Scale from 0..6 t – test with unequal variance Mean S.D p – value Min Max Purely VE PVE – VFHE 1.10 3.21 2.19 0.28 0 6 PVE – HE Hybrid VE 1.64 1.86 2.17 0.11 0 6 VFHE – HE Visually Faithful Hybrid VE 0.64 2.36 1.94 0.53 0 6

Significance • What does it mean to be significant? • You have some confidence it was not due to chance. • But difference between statistical significance and meaningful significance • Always know: • samples (n) • p value • variance/standard deviation • means

Understanding Hypothesis Testing in Computer Science User Studies

Understanding Hypothesis Testing in Computer Science User Studies

Presentation Transcript

April 2002

22 April 2002

DRAFT RESTRICTED April 2002

11 April 2002

Benjamin Lok, PhD. Associate Professor Computer and Information Science and Engineering

Corporate Introduction April 2002

Created in April 2002.

Parker April 18, 2002

Lecture 21 â€“ April 4, 2002

LOK SATTA

April 2002

91.3913 Assignment 4 Answers April 5, 2002

Anna LOK

April 2002

Reality Continuum Benjamin Lok University of Florida

Analyst presentation April 2002

April 10, 2002

April 29th, 2002

02 APRIL 2002

April 2002