1 / 42

The Challenge of Assessing Reading for Understanding to Inform Instruction

The Challenge of Assessing Reading for Understanding to Inform Instruction. Barbara Foorman, Ph.D. Florida Center for Reading Research Florida State University. Problem. What is reading comprehension and how is it measured?

lois
Download Presentation

The Challenge of Assessing Reading for Understanding to Inform Instruction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Challenge of Assessing Reading for Understanding to Inform Instruction Barbara Foorman, Ph.D. Florida Center for Reading Research Florida State University

  2. Problem • What is reading comprehension and how is it measured? • Dissatisfaction with traditional “mean proficiency” approaches that judge students’ achievement relative to a benchmark has led to an interest in basing accountability on individual academic growth. • Yet the challenge of informing instruction remains

  3. Simple View of Reading The ability to read and obtain meaning from what was read. Multiplied by Equals Recognizing words in text & sounding them out phonemically The ability to understand language Gough and Tunmer (1986)

  4. Scarborough (2002)

  5. Constructs/Measures by Grade

  6. Word recognition, vocabulary, background knowledge, strategy use, inference-making abilities, motivation Text structure, vocabulary, genre discourse, motivating features, print style and font Sociocultural TEXT READER ACTIVITY Purpose, social relations, school/classroom/peers/ families Environment, cultural norms Context A heuristic for thinking about reading comprehension (Sweet & Snow, 2003)

  7. Components of Reading Comprehension(Perfetti, 1999) Comprehension Processes General Knowledge Situation Model Linguistic System Phonology Syntax Morphology Text Representation Inferences Parser Meaning and Form Selection Lexicon Meaning Morphology Syntax Word Representation Identification Word Orthography Mapping to phonology Orthographic Units Phonological Units Visual Input

  8. Goal of RFU Assessment Grant ETS and FSU/FCRR are designing a new assessment system consisting of: • Purpose-driven, scenario-based, summative assessment (textbase + situation model; motivation; prior knowledge; text complexity. • Component skill measures to predict achievement trajectories and provide additional information about non-proficient readers.

  9. Current Approaches to Measuring Growth • TN uses EVASS (Sanders, 2000): deviation from mean level of growth. • Ohio uses achievement plus growth from previous and current year: above, met, and below expected growth. • Colorado uses the Student Growth Percentile Model based on conditional percentile ranks and quantile regression (Betebenner, 2008).

  10. Interim Assessments • Why not measure growth within a year? • Interim assessments are mid-way between formative and summative and can be aggregated up above the classroom level. • Interim assessments are ideal for informing learning, instruction, and placement.

  11. Florida Assessments for Instruction in Reading • A K-2 assessment system administered to individual students 3 times a year, with electronic scoring, Adobe AIR version, and PMRN reports linked to instructional resources. • A 3-12 computer-based system where students take the assessments 3 times a year. Several tasks are adaptive. PMRN reports are available, linked to instructional resources. Printed toolkit available.

  12. The K-2 “Big Picture” Map

  13. K-2 Targeted Diagnostic Inventory (TDI)

  14. The K – 2 “Score” Map 16

  15. Broad Screen/Progress Monitoring Tool Reading Comprehension Task (3 Times a Year) Targeted Diagnostic Inventory Maze & Word Analysis Tasks Ongoing Progress Monitoring (As Needed) Diagnostic Toolkit (As Needed) Grades 3-12 Assessments Model If necessary

  16. RC Screen Helps us identify students who may not be able to meet the grade level literacy standards at the end of the year as assessed by the FCAT without additional targeted literacy instruction.  Mazes Helps us determine whether a student has more fundamental problems in the area of text reading efficiency and low level reading comprehension. Relevant for students below a 6th grade reading level.  Word Analysis Helps us learn more about a student's fundamental literacy skills--particularly those required to decode unfamiliar words and read and write accurately.  Purpose of Each 3-12 Assessment

  17. How is the student placed into the first passage/item?

  18. How is the student placed into subsequent passages? • Based on the difficulty of the questions the student answers correctly on the first passage, the student will then be given a harder or easier passage for their next passage. • Difficulty of an item is determined using Item Response Theory (IRT). Estimates based on theta/SE. • Because of this, the raw score of 7/9 for Student A and 7/9 for Student B, when reading the same passage, does not mean they will have the same converted scores.

  19. Florida Assessments for instruction in Reading (FAIR): 3-12 Measures 21

  20. FAIR 3-12 Score Types 22

  21. Research Questions • How do FAIR RC scores correlate to FCAT? • What is the value-added of FAIR RC to prior FCAT? • Does FAIR RC significantly reduce identification errors above and beyond prior FCAT? • What is the value added of growth vs. difference? • What is the value added of growth and prior FCAT?

  22. FL Comprehensive Assessment Test (FCAT): Grades 3-10 • FCAT Reading is a group-administered, criterion-referenced test consisting of 6-8 informational & literary passages, with 6-11 multiple choice items per passage. • 4 content clusters in 0910: 1) words & phrases in context; 2) main idea; 3) comparison/cause & effect; 4) reference & research. • Reliability up to .9; content & concurrent validity (FLDOE, 2001; Schatschneider et al., 2004)

  23. Participants: 951,893 students in FL Public Schools in PNRN

  24. Analyses • Correlations of FCAT SSS & FAIR’s FSP & SS • Multiple regression of current FCAT by: prior FCAT; FCAT plus FAIR RCA; FAIR RCA alone. • Comparisons of NPP in predicting current FCAT: 1) prior FCAT, or 2) prior FCAT + FAIR’s RCA. • HLM comparisons of current FCAT by: Bayesian growth in RCA vs. difference scores. • HLM comparisons of current FCAT: Bayesian growth with & without prior FCAT

  25. Table 1: Correlations between the FCAT and both RC Screen & FSP

  26. Table 2: Estimates of Variance Explained by Prior FCAT and FCAT + RCA

  27. Table 3: Comparing Negative Predictive Power in Predicting Current FCAT with either Prior FCAT or Prior FCAT + FAIR’s RCA

  28. Table 4. HLM Estimates of FCAT Comparing R2 in Growth vs. Difference Score Models

  29. Table 5. HLM Estimates of FCAT Comparing R2 in Autoregressive, Growth, and Difference Score Models

  30. Improvements on “Mean Proficiency” approach • FAIR + Prior FCAT accounts for up to 7% unique variance; • Simple Difference approach to measuring FAIR growth accounts for up to 2-3% unique variance beyond prior FCAT + FAIR; • Improvements in prediction lead to: • Reduction in mis-identification of risk (from 14%-30% with prior FCAT to 2%-15% with prior + FAIR) • Better placement for reading intervention

  31. Challenges to Implementing FAIR • How is progress monitored? (score types, RTI decision-making) • What is the value of an adaptive test? • Benchmark mania • Scaling professional development, making instructional resources available, and building local capacity

  32. Reading Comprehension Mazes Word Analysis AP Score PM score AP Score PM score AP Score PM score student lexile score student lexile score Percentile rank WAAS Percentile rank Adj. Maze SS FSP SS RCAS %ile & SS AP = Assessment Period; PM = progress monitoring; SS = standard score

  33. Value of Computer-Adaptive Tests • Provides more reliable & quicker assessment of student ability than a traditional test, because it creates a unique test tailored to the individual student’s ability. • Provide more reliable assessments particularly for students at the extremes of ability (extremely low ability or extremely high ability). • Grade-level percentiles are currently provided; Grade Equivalent scores will be provided next year.

  34. Benchmark Conundrum • Benchmark tests rarely have enough items to be reliable at the benchmark level. Besides, teaching to benchmarks (e.g., “the student will use context clues to determine meanings of unfamiliar words”) results in fragmented skills. Teach to the standard(s) (e.g., “The student uses multiple strategies to develop grade appropriate vocabulary). Assess at aggregate levels (e.g., Reporting Categories), if CFA show categories are valid.

  35. FCAT 2.0 Reporting Categories • Reporting Category 1: Vocabulary • Reporting Category 2: Reading Application • Reporting Category 3: Literary Analysis- Fiction/Nonfiction • Reporting Category 4: Informational Text/ Research Process

  36. FCAT 2.0: Benchmarks x Grade

  37. Possible Benchmark Solutions • Stop-gap: start each students with grade-level passage. Provide % correct on Reporting Categories. Then continue to current adaptive system to obtain reliable, valid FSP and RCAS. • For the future: • Align FAIR to the Common Core. Develop grade-level CAT that is item adaptive. Incorporates vocabulary depth/breadth. • Challenges: Dimensionality; multi-dimensional IRT; testlet effects.

  38. Precision

  39. Word Meanings Text Kinds of Vocabulary Knowledge • Definitional • Usage/contextual • Definitional/contextual • Relational • Morphological

  40. Thank You Comments or Questions?

More Related