S519: Evaluation of Information Systems

S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 16: reliability and validity

Last week

This week • What are reliability and validity

Reliability and validity • Leathers, S. (2003). Parental visiting, conflicting allegiances, and emotional and behavioral problems among foster children. Family Relations, 52, 53-63

Your measurement • How do I know that the test, scale, instrument, etc., It works every time I use it (reliability)? • How do I know that the test, scale, instrument, etc., I use measures what it is supposed to (validity)?

Scales of measurement • What is measurement: • The assignment of values to outcomes following a set of rules • The scales of measurement have four types: • Nominal, ordinal, interval and ratio

Nominal level of measurement • An outcome can fit into one and only one class or category • E.g. gender, political affiliation • The least precise level of measurement • Categories should be mutually exclusive

The ordinal level of measurement • Things are ordered • E.g., a rank of candidates for a job

The interval level of measurement • Underlying continuum such that we can talk about how much more a higher performance is than a lesser one. • 10 words correct is twice as many as five words correct • 10 words correct is two more than eight correct and three more than five correct

Ratio level of measurement • The presence of an absolute zero on the scale • In biological sciences, zero molecular movement, zero light • In social and behavioral sciences, it is a bit harder

In sum • Any outcome can be assigned to one of the four scales of measurement • Scales of measurement have an order, from the least precise being nominal, to the most precise being ratio • The “higher up” the scale of measurement, the more precise the data being collected, and the more detailed and informative the data are

In sum

Reliability • Whether a test, or whatever you use as a measurement tool, measures something consistently

Observed score • Observed score = true score + error score

Some ways for reliability • Test-retest reliability • Want to examine whether a test is reliable over time • Compute the Pearson correlation coefficient on scores from a test at Time 1and Time 2

Validity • The property of an assessment tool that indicates that the tool does what it says it does • Content validity • Validate through domain expert • Criterion validity • Validate your criteria with existing tests or criteria • Literature support

More resources • Winning textbook: • http://www.psychstat.missouristate.edu/introbook/sbk00.htm • Good statistics course • http://surfstat.anu.edu.au/surfstat-home/surfstat-main.html • Statistical glossary • http://www.davidmlane.com/hyperstat/index.html

S519: Evaluation of Information Systems