1 / 40

Selecting Effective Early Reading Assessments

Selecting Effective Early Reading Assessments. Natalie Rathvon, Ph.D. What We’ll Cover . A research-based framework for selecting early reading assessments Application of the framework to selected early reading instruments Early reading assessment case examples

mahdis
Download Presentation

Selecting Effective Early Reading Assessments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selecting Effective Early Reading Assessments Natalie Rathvon, Ph.D.

  2. What We’ll Cover • A research-based framework for selecting early reading assessments • Application of the framework to selected early reading instruments • Early reading assessment case examples • Resources for early reading assessment and intervention

  3. So many tests, so few guidelines . . . • Growing number of print and online tests that claim to assess or predict reading • Standards for Psychological and Educational Testing (AERA, APA, & NCME, 1999) • Provides general guidelines--not specific criteria--for evaluating psychometric quality

  4. Myths about Early Reading Assessments • All claims that a reading measure is “scientifically based” are equally valid. • A valid and reliable measure is equally valid and reliable for all examinees. • All measures of the same reading component yield similar results for the same examinee.

  5. Does Tim (Grade 1) have a reading problem?

  6. Why does this happen? • Tests vary in terms of their psychometric characteristics and soundness. • Early Reading Assessment: A Practitioner’s Handbook

  7. Traditional “Standard battery” (one size fits all) Assumes reading problems arise from internal child deficits Designed to provide a categorical label for programming purposes Component-based Targets domains related to the identified deficits Assumes most reading problems arise from experiential and/or instructional deficits Designed to provide information for guiding instruction Early Reading Assessment Models

  8. 4 Cognitive-linguistic Variables Phonological processing Rapid naming Orthographic processing Oral language 6 Literacy Skills Print awareness Alphabet knowledge Single word reading Contextual reading Reading comprehension Written language 10 Key Reading Components

  9. Considerations in Selecting Early Reading Assessments • Technical adequacy:Psychometric soundness • Usability: Degree to which practitioners can actually use a measure in applied settings

  10. Five Key Technical Adequacy Characteristics • Norms • Test floors • Item gradients • Reliability • Validity

  11. How can we examine a test’s technical characteristics? • Test manuals? Tremendous variation in quality and quantity of the psychometric information provided • WJ III: 2 examiner manuals, separate 209-page technical manual • Dyslexia Early Screening Test: 7 pages in 45-page manual • Research literature? • Continuing stream of validation data

  12. Norms: How do we interpret performance? • Norm-referenced measures:Comparisons with age/grade peers • Criterion-referenced measures: Comparisons with pre-determined performance standards • Nonstandardized measures:Research norms or examiner judgment

  13. Evaluating the Adequacy of Norms • Are they representative? • Criteria:Should match a national or appropriate reference population • Are they recent? • Criteria:No more than 7 – 12 years old • Are subgroup and sample sizes large enough? • Criteria:At least 100 (subgroup size) & 1000 (sample size)

  14. Evaluating Norms, II • Are norm table intervals small enough to reflect small changes in skill development and small differences among examinees? • Criteria: • No more than 6 months for students aged 7-11 and younger • No more than 1 year for students aged 8-0 to 18

  15. Norms example 1: Expressive Vocabulary Test (AGS, 1997) • Date = 1995-1996 (age norms only) • Total norm group = 2,725 examinees • 5-0 to 6-11 group = 119-122 examinees tested per each 6-month interval • Derived scores = 2-month increments • Derived scores for 5-0 to 6-11 age group are based on 39-56 examinees.

  16. Norms example 2: TOWRE 8-year-old Grade 2 student

  17. Reliability: Are scores consistent and accurate? • Alternate-form: Form A vs. Form B • Internal consistency: Item A vs Item B • Test-retest: Time A vs. Time B • Interscorer: Scorer A vs. Scorer B • Criteria: =/> .80 for screening measures; =/> .90 for diagnostic measures

  18. Hidden Threat to Reliability • Examiner variance: Differences among assessors in administering tasks and recording responses • Especially likely on: • Live-voice tasks (phoneme blending) • Fluency-based tasks (rapid naming) • Tasks with complex administration or scoring systems (LAC–3)

  19. Reliability Example: TOWRE (PRO-ED, 1999) • Internal consistency = .93 and above • Alternate form = .90 and above • Test-retest = .90 and above for a study with examinees ages 6-9 (n = 29) • Interscorer = .99, based on agreement of 2 independent scorers with 30 completed protocols

  20. Test Floors: Can the Test Detect Poor Readers? • Test floor:Lowest possible standard score when a student answers 1 item correctly • Adequate floors: Permit identification of students with very weak skills • Inadequate floors: Overestimate students’ level of skills

  21. Test Floor Criteria • A subtest raw score of 1 should yield a standard score greater than 2 standard deviations below the subtest mean. • SS of 3 or less for a subtest mean of 10 • SS of 69 or less for a subtest mean of 100

  22. Which Tests Are Likely to Display Floor Effects? • “Cradle-to-grave” tests • Phonemic manipulation tasks (deletion, substitution, reversal) • Oral reading fluency tests • Pseudoword reading tests • Spelling tests • Reading comprehension tests

  23. Item Gradients: Can the Test Detect Small Differences? • Item gradient: Steepness with which standard scores change from 1 raw score unit to another • Adequate gradient: Sensitive to small differences in performance • Steep gradient: Obscures differences among performance levels

  24. Item Gradient Criteria • 6 or more items between subtest floor and mean (M = 10) or • 10 or more items between subtest floor and mean (M = 100) • Caution: Item gradients should be evaluated in the context of test floors.

  25. Test Floors and Item Gradients: Special Cases • Screening tests • Critical issue is cutoff score accuracy, not floor/gradient violations • Tests not yielding standard scores • Deciles, percentiles, quartiles, stanines • Rasch-model tests • Preclude direct inspection of raw score-standard score relationships • WJ family: WJ III, WRMT-R/NU, WDRB

  26. Floor & Gradient Example: GORT-4 (PRO-ED, 2001) • Item gradients = adequate • Floors • Rate = inadequate below 8-0 for both forms • Accuracy = inadequate below 7-6 for Form A and below 8-0 for Form B • Comprehension = inadequate below 8-0 for Form A and below 9-0 for Form B • ORQ = inadequate below 6-6 for Form A and below 7-6 for Form B

  27. Validity: Are the Results Meaningful? • Content validity: Effectiveness in assessing the relevant domain • Criterion-related validity:Effectiveness in predicting performance now (concurrent validity) or later (predictive validity) • Construct: Effectiveness in measuring what the test is supposed to measure • Criteria: Evidence of all three types of validity for the target population

  28. Validity Example: WJ III ACH • Content validity: remarkably little content validity evidence • Criterion-related validity:correlates .63 to .82 with WIAT • WJ III Written Expression mean standard scores = more than 10 points higher than WIAT Written Expression mean standard scores

  29. WJ III ACH Validity Example, Cont. • Diagnostic utility = study with 48 students with ADHD ages 6 – 17 • ADHD group scored significantly lower than norm group on 3 of 8 WJ III ACH tests (Oral Comprehension, Passage Comprehension and Calculation)

  30. The Untold Story: Usability Considerations • Usability often has more influence in test selection and use than technical adequacy. • Virtually no research on impact of usability on test selection and use

  31. Do these comments sound familiar? • “I know how to give it.” • “It doesn’t take long to give.” • “It’s easy to carry around.” • “I think I saw one in the storage closet.” • “I think that test kit has all the parts.”

  32. Key Practical Characteristics • Test construction • Administration • Accommodations and adaptations • Scores and scoring • Interpretation • Links to intervention

  33. Usability Example: DEST (PsyCorp, 1996) • Inexpensive ($130.00) • Has numerous stimulus materials to manage, increasing administration time • Letter Naming subtest: 4 cards for 12 items • Digit Naming subtest: 3 cards for 9 items • Requires calibrating a postural stability balance tester • Manual is not spiral bound, so it doesn’t lie flat during administration.

  34. Increasing the Effectiveness of Early Reading Assessments • Begin with measures that target domains directly related to the referral problem. • Supplement norm-referenced measures with criterion-referenced measures to ensure adequate coverage and increase instructionally relevant information. • Know the psychometric strengths and limitations of each measure you use.

  35. Increasing Effectiveness, II • Evaluate the presence of attentional, behavior, and motivational problems. • Key predictors of response to intervention • The Unmotivated Child • Assess environmental and instructional variables.

  36. Instructional Disability?

  37. The Golden Rule of Assessment • The best designed assessment with the most reliable and valid measures administered by the best trained examiner won’t change a child’s reading trajectory . . . unless someone in the child’s life does something different. Effective School Interventions: Strategies for Enhancing Academic Achievement and Social Competence

  38. Early Reading Assessment and Intervention Resources AERA, APA, & NCME. (1999). Standards for educational and psychological testing. Washington DC: AERA. www.apa.org Buros Institute of Mental Measurements. www.unl.edu/buros Center for Equity and Excellence in Education Test Database. http://ceee.gwu.edu/standards_assessments/sa.htm ERIC Clearinghouse on Assessment. http://www.ericae.net Florida Reading Research Center. http://www.fcrr.org

  39. More Resources • Rathvon, N. (2004). Early Reading Assessment: A Practitioner’s Handbook. New York: Guilford. www.guilford.com • Rathvon, N. (1999). Effective School Interventions: Strategies for Enhancing Achievement and Social Competence. New York: Guilford. www.guilford.com • Rathvon, N. (1996). The Unmotivated Child: How to Help Your Underachiever Become a Successful Student. New York: Simon & Schuster. www.simonsays.com • Southern Educational Development Laboratory. www.sedl.org/reading/rad

  40. Thank you!

More Related