1 / 35

Validity/Reliability Matters

Validity/Reliability Matters. Really?. Can a test be valid and not be reliable?. Can a test be reliable and not be valid?. Justifiable Relevant True to its purpose (consistently). Validity. Validity. Design Issues Application Issues. Validity. Design Issues Application Issues.

Download Presentation

Validity/Reliability Matters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Validity/Reliability Matters Really? Beverly Mitchell, Kennesaw State University

  2. Can a test be valid and not be reliable? Beverly Mitchell, Kennesaw State University

  3. Can a test be reliable and not be valid? Beverly Mitchell, Kennesaw State University

  4. JustifiableRelevantTrue to its purpose(consistently) Validity Beverly Mitchell, Kennesaw State University

  5. Validity Design Issues Application Issues Beverly Mitchell, Kennesaw State University

  6. Validity Design Issues Application Issues Beverly Mitchell, Kennesaw State University

  7. Design: Creating the Instrument 1-Inference 2-Complexity Beverly Mitchell, Kennesaw State University

  8. Inference Low High Beverly Mitchell, Kennesaw State University

  9. HighInference • To draw a conclusion • To guess, surmise • To suggest, hint Beverly Mitchell, Kennesaw State University

  10. LowInference • Straightforward • Language = precise & targeted • Clear – no competing interpretations of words • No doubt as to what point is being made Beverly Mitchell, Kennesaw State University

  11. Inference Low High Beverly Mitchell, Kennesaw State University

  12. Complexity Low High Beverly Mitchell, Kennesaw State University

  13. HighComplexity • Complicated • Comprised of interrelated parts or sections • Developed with great care or with much detail Beverly Mitchell, Kennesaw State University

  14. LowComplexity • Simplistic • Plain • Unsophisticated Beverly Mitchell, Kennesaw State University

  15. Complexity Low High Beverly Mitchell, Kennesaw State University

  16. How They Are Related Low Complexity High High Low Beverly Mitchell, Kennesaw State University Inference

  17. Designing the Instrument Low Complexity High High Low Beverly Mitchell, Kennesaw State University Inference

  18. Due “Yesterday”! Low Complexity High High Low Beverly Mitchell, Kennesaw State University Inference

  19. “Overachieving” Low Complexity High High Low Beverly Mitchell, Kennesaw State University Inference

  20. How Much Error Are You Willing to Risk? Low Error Complexity Error High High Low Beverly Mitchell, Kennesaw State University Inference

  21. Compromise Low Complexity High High Low Beverly Mitchell, Kennesaw State University Inference

  22. Does the OBSERVED Behavior = True Behavior? Observed SCORE ≠ TRUE SCORE E R R O R Beverly Mitchell, Kennesaw State University

  23. Design: Creating the Instrument 1-Inference 2-Complexity Easy to develop – question worthiness, guidance, single interpretation - low Time to develop – labor intensive, onerous, long - high • General Rubric - high • Qualitative analytic rubric – low Beverly Mitchell, Kennesaw State University

  24. Validity Design Issues Application Issues Beverly Mitchell, Kennesaw State University

  25. Application Issues • Designated Use • Limitations/Conditions Beverly Mitchell, Kennesaw State University

  26. Application Issues • Designated Use • Don’t borrow from neighbor! Beverly Mitchell, Kennesaw State University

  27. Application Issues • Limitations/Conditions • One size does not fit all or apply to all circumstances Beverly Mitchell, Kennesaw State University

  28. Ways to Increase Probability for Accuracy • Compare language: standards & concepts • The concepts/expectations in the standards are apparent in the assessments – same depth and breadth • Good example of Content Validity • Behavior (performance) expected in the standard matches the performance expected in the assessment – i.e., knowledge of…demonstrating skill… • Identify Key/Critical items/concepts to evaluate • Give it away for analysis (many eyes) • Invite external “expert” review • Be receptive to feedback • Surveys from P-12 partners, candidates • Regular evaluation and analysis: revise, revise, revise • Awareness of design and application issues Beverly Mitchell, Kennesaw State University

  29. Ways to Increase Reliability • Begin with a valid instrument • Two reliability issues: • Reliability of the instrument: repeated use of instrument by same evaluators • If problematic: revise, re-think, abandon • Reliability of the scoring: performance rated same by different evaluators, i.e., objectivity • If problematic: ensure qualifications of evaluators, check rubric, check language, minimize generalized concepts applied to all subject areas • Train evaluators frequently Beverly Mitchell, Kennesaw State University

  30. AN APPLICATION: A KSU Workshop (Handouts Available) • Thirty experienced teachers participated in a daylong workshop to help us evaluate three student teaching observation rating forms. Beverly Mitchell, Kennesaw State University

  31. Three Instruments • Traditional Candidate Performance Instrument (CPI) Observation of Student Teaching. Observer is asked to indicate strengths and weaknesses and areas for improvement in three broad outcomes (Subject matter, Facilitation of Learning, and Collaborative Professional). • Modified CPI Observation of Student Teaching (Observer is asked to explicitly rate each proficiency within each outcome and then provide narrative indicating any strengths, weaknesses, suggestions for improvement. • Formative Analysis Class Keys: Observer is asked to rate 26 elements from Georgia Department of Education’s Class Keys. No required narrative. Beverly Mitchell, Kennesaw State University

  32. Generally we were interested in two areas………………. • Validity/Accuracy – Which instrument provides us the best inference about the present of positive behaviors (proficiencies) we deem important? AND • Reliability/Consistency – Which instrument demonstrates the best inter-rater reliability? Beverly Mitchell, Kennesaw State University

  33. Study Design Beverly Mitchell, Kennesaw State University

  34. Reliability • Strongest inter-rater agreement between Modified CPI with performance level rating followed by Class Keys Formative Assessment Instrument with a performance level rating. • Very little agreement between behaviors noted in Traditional CPI narratives and no performance level ratings were available. Probably not a reliable instrument for rating student teaching behaviors. Beverly Mitchell, Kennesaw State University

  35. Validity • Both the traditional CPI and Modified CPI are explicitly aligned with institutional (and other) standards but the Traditional CPI is a global assessment and the Modified CPI requires a rating and narrative for each proficiency. • However, the traditional CPI has not demonstrated reliability….so • Participants were also asked to provide information about the language, clarity, ease of use for all instruments. Beverly Mitchell, Kennesaw State University

More Related