1 / 61

Reliability and Validity

Reliability and Validity. Introduction to Study Skills & Research Methods (HL10040). Dr James Betts. Lecture Outline:. Definition of Terms Types of Validity Threats to Validity Types of Reliability Threats to Reliability Introduction to Measurement Error. Commonly used terms…

remma
Download Presentation

Reliability and Validity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability and Validity Introduction to Study Skills & Research Methods (HL10040) Dr James Betts

  2. Lecture Outline: • Definition of Terms • Types of Validity • Threats to Validity • Types of Reliability • Threats to Reliability • Introduction to Measurement Error.

  3. Commonly used terms… “She has a valid point” “My car is unreliable” …in science… “The conclusion of the study was not valid” “The findings of the study were not reliable”.

  4. Validity “The soundness or appropriateness of a test or instrument in measuring what it is designed to measure” (Vincent 1999) Some definitions…

  5. Validity “Degree to which a test or instrument measures what it purports to measure” (Thomas & Nelson 1996) Some definitions…

  6. Reliability “…the degree to which a test or measure produces the same scores when applied in the same circumstances…” (Nelson 1997) Some definitions…

  7. Objectivity “…the degree to which different observers agree on measurements…” (Atkinson & Nevill 1998) Some definitions…

  8. Internal Is the experimenter measuring the effect of the independent variable on the dependent variable? External Can the results be generalised to the wider population? Types of Experimental Validity

  9. Logical Statistical Validity Face Content Concurrent Consistency Objectivity Predictive Construct Reliability AKA Criterion

  10. Face Validity Infers that a test is valid by definition It is clear that the test measures what it is supposed to Logical Validity • e.g. • If you want to assess reaction time, measuring how long it takes an individual to react to a given stimulus would have face validity Externally Valid?

  11. Face Validity Infers that a test is valid by definition It is clear that the test measures what it is supposed to Assessing face validity is therefore a subjective process. Logical Validity • i.e. • Would assessing 15 m sprint time be a valid means of assessing reaction time?

  12. Content Validity Infers that the test measures all aspects contributing to the variable of interest …also a subjective process. Logical Validity • e.g. • Who is the most physically fit? • VO2 max test? • Wingate test? • 1 RM?

  13. Overall: A logically valid test simply appears to measure the right variable in its entirety?

  14. Concurrent Validity Infers that the test produces similar results to a previously validated test Statistical Validity • e.g. • VO2 max Incremental Treadmill Protocol with expired gas analysis Multi-Stage Fitness (Beep) Test

  15. Predictive Validity Infers that the test provides a valid reflection of future performance using a similar test Statistical Validity • e.g. • Can performance during test A be used to predict future performance in test B? A B http://www.youtube.com/watch?v=vdPQ3QxDZ1s

  16. Overall: A statistically valid test produces results that agree with other similar tests?

  17. Construct Validity Infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what should exist, theoretically Therefore relates to hypothetical or intangible constructs Logical/Statistical Validity • e.g. • Team Rivalry • Sportsmanship.

  18. Construct Validity Infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what should exist, theoretically Therefore relates to hypothetical or intangible constructs This makes assessment difficult, i.e. if what should exist cannot be detected, this could mean: Logical/Statistical Validity c) Sensitivity/Specificity Issues? a) Test Invalid? b) Theory Incorrect?

  19. Incidence: ~1 % (0.8 %) (i.e. a positive result should be detected for approximately 1 in every 100 women tested) Sensitivity: ~90 % (87 %) (the mammogram is sensitive enough that approximately 90 in every 100 breast cancer patients will receive a positive result) Specificity: ~90 % (93 %) (the mammogram is specific enough that approximately 90 in every 100 healthy patients will receive a negative result). Interesting Example: Breast Cancer Data from Kerlikowske et al. (1996)

  20. What is the probability that a patient receiving a positive result actually has breast cancer? Quick Test

  21. Threats to Validity(and possible solutions?)

  22. Maturation Changes in the DV over time irrespective of the IV Threats to Internal Validity

  23. Maturation O2 T O1 Threats to Internal Validity e.g. One Group Pre-test Post-test

  24. Maturation (possible solution) O3 O4 O5 O1 O2 O6 T Threats to Internal Validity Time series

  25. Maturation (possible solution) O1 T O2 n.b. RCT R O3 P O4 PLACEBO Threats to Internal Validity Pre-test Post-test Randomised Group Comparison

  26. Maturation (possible solution) Threats to Internal Validity Repeated measures designs can occasionally be an inappropriate solution, even when randomised and counterbalanced e.g. Muscle Damage (repeated bout effect) Vitamin Supplementation (wash-out period) In which case independent measures designs could be used.

  27. History Unplanned events between measurements Threats to Internal Validity

  28. History O2 T O1 e.g. exercise? Threats to Internal Validity Therefore, solution = control extraneous variables!

  29. Pre-testing Interactive effects due to the pre-test (e.g. learning, sensitisation, etc.) Also influences External Validity Threats to Internal/External Validity

  30. Pre-testing e.g. O1 T O2 R O3 P O4 PLACEBO Threats to Internal/External Validity …so it is actually T+O1 that is better than P, not T alone. Assessing muscle mass here could make them train harder in both trials… …but then respond better to the T than the P…

  31. Pre-testing (possible solution) T O2 O1 P O4 O3 R T O5 P O6 PLACEBO PLACEBO Threats to Internal/External Validity • Solomon Four-Group Design

  32. Statistical Regression AKA regression to the mean An initial extreme score is likely to be followed by less extreme subsequent scores Threats to Internal Validity Sophomore Slump & SI ‘Cover Jinx’ e.g. Training has the greatest effect on untrained individuals. Therefore, solution = effective sampling.

  33. Instrumentation A difference in the way 2 comparable variables were measured Threats to Internal Validity e.g. Uncalibrated equipment Therefore, solution = calibrate!

  34. Selection Bias The groups for comparison are not equivalent Threats to Internal Validity

  35. Selection Bias T O1 P Oa PLACEBO Threats to Internal Validity e.g. Groups not randomly assigned i.e. Group T were resistance trained to start with Static Group Comparison

  36. Selection Bias (possible solution) T P PLACEBO Threats to Internal Validity O1 Either: -Randomise group assignment, -Pre-test and post-test difference, -Repeated Measures Design. Oa

  37. Experimental Mortality Missing Data due to subject drop-out Reduced n = reduced statistical Power Not only challenges quality of data gathered (Internal Validity) but also our ability to generalise (External Validity). Threats to Internal/External Validity Therefore, solution = recruit sufficient participants (young?)

  38. Inadequate description 5th characteristic of research… …should be replicable If nobody can replicate the methods of a given study, then it is irrefutable and therefore lacks external validity. Threats to External Validity Therefore, solution = comprehensive methodology

  39. Biased sampling Linked to statistical regression Sample does not reflect target population n ≠ N Threats to External Validity Results generalised across gender Therefore, solution = random sample (of target population).

  40. Hawthorne Effect DV is influenced by the fact that it is being recorded Threats to External Validity e.g. Fastest sprint when professor enters lab Therefore, solution = control the lab environment.

  41. Demand Characteristics Participants detect the purpose of the study and behave accordingly e.g. Sports Science students already know that the carbohydrate drink is supposedly superior Threats to External Validity Therefore, solution = double or single blinding. CHO H2O

  42. Operationalisation AKA Ecological Validity The DV must have some relevance in the ‘real world’ Threats to External Validity • e.g. • TTE has no Olympic equivalent Therefore, solution = choose your DV carefully.

  43. Reliability is a pre-requisite of validity Reliability • e.g. Direct versus Indirect measures of VO2 max -Predictive -Cheap -Easy -Gold Standard -Expensive -Complex (i.e. valid and reliable)

  44. Reliability Subject 1 60 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1 Subject 2 55 ml.kg-1.min-1 55 ml.kg-1.min-1 55 ml.kg-1.min-1 Subject 3 70 ml.kg-1.min-1 70 ml.kg-1.min-1 70 ml.kg-1.min-1 Valid and Reliable

  45. Reliability Subject 1 60 ml.kg-1.min-165 ml.kg-1.min-1 65 ml.kg-1.min-1 Subject 2 55 ml.kg-1.min-160 ml.kg-1.min-1 60 ml.kg-1.min-1 Subject 3 70 ml.kg-1.min-175 ml.kg-1.min-1 75 ml.kg-1.min-1 5 ml.kg-1.min-1 correction? Not Valid but Reliable

  46. Reliability Subject 1 60 ml.kg-1.min-172 ml.kg-1.min-1 57 ml.kg-1.min-1 Subject 2 55 ml.kg-1.min-161 ml.kg-1.min-1 52 ml.kg-1.min-1 Subject 3 70 ml.kg-1.min-140 ml.kg-1.min-1 84 ml.kg-1.min-1 i.e. a test can never be valid without being reliable? Not Valid and not Reliable

  47. Relative Absolute Rater reliability (Objectivity) Intrarater reliability Interrater reliability. Types of Reliability

  48. Relative Reliability Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1 Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1 Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1 i.e. Individuals maintain position in the group Relatively Reliable

  49. Absolute Reliability Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1 Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1 Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1 i.e. Test-Retest within individuals Not Absolutely Reliable

More Related