Operational Data or Experimental Design?
1 / 31

Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations - PowerPoint PPT Presentation

  • Uploaded on

Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations. Cara Cahalan-Laitusis. Review types of evidence Review current research designs Pros/Cons for each approach. Types of Validity Evidence. Psychometric research

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations' - alida

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Operational Data or Experimental Design?A Variety of Approaches to Examining the Validity of Test Accommodations

Cara Cahalan-Laitusis

Slide2 l.jpg

Types of validity evidence l.jpg
Types of Validity Evidence

  • Psychometric research

  • Experimental research

  • Survey research

  • Argument based approach

Psychometric indicators national academy of sciences 1982 l.jpg
Psychometric Indicators (National Academy of Sciences, 1982)

  • Reliability

  • Factor Structure

  • Item functioning

  • Predicted Performance

  • Admission Decisions

Psychometric evidence l.jpg
Psychometric Evidence

  • Is the test as reliable when taken with and without accommodations? (Reliability)

  • Does the test (or test items) appear to measure the same construct for each group? (Validity)

  • Are test items of relatively equal difficulty for students with and without a disability who are matched on total test score? (Fairness/Validity)

Psychometric evidence6 l.jpg
Psychometric Evidence

  • Are completion rates relatively equal between students with and without a disability who are matched on total test score? (Fairness)

  • Is equal access provided to testing accommodations across different disability, racial/ethnic, language, gender, and socio-economic groups? (Fairness)

  • Do tests scores under or over predict an alternate measure of performance (e.g., grades, teacher ratings, other test scores, post graduate success) for students with disabilities? (Validity)

Advantages of operational data l.jpg
Advantages of Operational Data

  • Cost effective

  • Quick results

  • Easy to replicate

  • Provides evidence of validity

  • Large sample size

  • Motivated test takers

Limitations of operational data l.jpg

Disability and accommodation are confounded

Order effects can not be controlled for

Sample size can be insufficient

Difficult to show reasons why data is not comparable between subgroups

Disability and Accommodation codes are not always accurate

Approved accommodations may not be used

Disability category may be too broad

Limitations of Operational Data

Types of analyses l.jpg
Types of Analyses

  • Correlations

  • Factor Analysis

  • Differential Item Functioning

  • Descriptive analyses

Relationship among content areas l.jpg
Relationship Among Content Areas

  • Correlation between content areas (e.g. reading and writing) can also assess a tests reliability.

    • Compare correlations among content areas by population (e.g., LD with read aloud vs. LD without an accommodation)

    • Does the accommodation alter construct being measured? (e.g., correlations between reading and writing may be lower if read aloud is used for writing but not reading).

    • Is correlation significantly lower for one population? (difference of .10 or greater)

Reliability l.jpg

  • Examine internal consistency measures

    • with and without specific accommodations

    • with and without a disability

  • Examine test-retest reliability between different populations

    • with and without specific accommodations

    • with and without a disability

Factor structure l.jpg
Factor Structure

  • Types of questions

    • Are the number of factors invariant?

    • Are the factor loadings invariant for each of the groups?

    • Are the intercorrelations of the factors invariant for each of the groups?

Differential item functioning l.jpg
Differential Item Functioning

  • DIF refers to a difference in item performance between two comparable groups of test takers

  • DIF exists if test takers who have the same underlying ability level are not equally likely to get an item correct

  • Some recent DIF studies on accommodations/disability

    • Bielinski, Thurlow, Ysseldyke, Freidebach & Friedebach, 2001

    • Bolt, 2004

    • Barton & Finch, 2004

    • Cahalan-Laitusis, Cook, & Aicher, 2004

Issues related to the use of dif procedures for students with disabilities l.jpg
Issues Related to the Use of DIF Procedures for Students with Disabilities

  • Group characteristics

    • Definition of group membership

    • Differences between ability levels of reference and focal groups

  • The characteristics of the criterion

    • Unidimensional

    • Reliable

    • Same meaning across groups

Procedures sample l.jpg
Procedures/Sample with Disabilities

  • DIF Procedures (e.g., Mantel-Haenszel, Logistic regression, DIF analysis paradigm, Sibtest)

  • Reference/focal groups

    • minimum of 100 per group, ETS uses a minimum of 300 for most operational tests

    • Select groups that are specific (e.g., LD with read aloud) rather than broad (e.g., all students with IEP or 504)

Dif with hypotheses l.jpg
DIF with hypotheses with Disabilities

  • Generate hypotheses on why items may function differently

  • Code items based on hypotheses

  • Compare DIF results with item coding

  • Examine DIF results to generate new hypotheses

Other psychometric research l.jpg
Other Psychometric Research with Disabilities

  • DIF to examine fatigue on extended time

  • Item completion rates between groups matched on ability

  • Loglinear analysis to examine if specific demographic subgroups (SES, race/ethnicity, geographic regions, gender) are using specific accommodation less than other groups.

Other research studies l.jpg
Other Research Studies with Disabilities

  • Experimental Research

    • Differential Boost

  • Survey/Field Test Research

  • Argument-based Evidence

Advantages of collecting data l.jpg
Advantages of Collecting Data with Disabilities

  • Disability and accommodation can be examined separately

  • Form and Order effects can be controlled

  • Sample can be specific (e.g., reading-based LD rather than all LD or LD with or without ADHD)

  • Opportunity to collect additional information

  • Reasons for differences can be tested

  • Data can be reused for psychometric analyses

Disadvantages l.jpg

Cost of large data collection with Disabilities

Test takers may not be as motivated

More time consuming than psychometric research

Over testing of students


Differential boost fuchs fuchs 1999 l.jpg
Differential Boost with Disabilities(Fuchs & Fuchs 1999)

  • Would students without disabilities benefit as much from the accommodation as students with disabilities?

    • If Yes then the accommodation is not valid.

    • If No, then the accommodation may be valid.

Differential boost design l.jpg
Differential Boost Design with Disabilities

Ways to reduce cost l.jpg
Ways to reduce cost: with Disabilities

  • Decrease sample size

  • Randomly assign students to one of two conditions

  • Use operational test data for one of the two sessions

Additional data to collect l.jpg
Additional data to collect: with Disabilities

  • Alternate measure of performance on construct being assessed

  • Teacher survey (ratings of student performance, history of accommodation use)

  • Student survey

  • Observational data (how student used accommodation)

  • Timing data

Additional analyses l.jpg
Additional Analyses with Disabilities

  • Differential Boost

    • by subgroups

    • controlling for ability level

  • Psychometric properties (e.g, DIF)

  • Predictive Validity (alt performance measure required)

Field testing survey l.jpg
Field Testing Survey with Disabilities

  • How well does item type measure intended construct (e.g., reading comprehension, problem solving)?

  • Did you have enough time to complete this item type?

  • How clear were the directions (for this type of test question)?

Field testing survey28 l.jpg
Field Testing Survey with Disabilities

  • How would you improve this item type?

    • To make the directions clearer

    • To measure the intended construct

  • What specific accommodations would improve this item type?

  • Which presentation approach did the test takers prefer?

Additional types of surveys l.jpg
Additional Types of Surveys with Disabilities

  • How accommodation decisions are made

  • Expert opinion on how/if accommodation interferes with construct being measured

  • Information on how test scores with and without accommodations interpreted

  • Correlation between use of accommodations in class and on standardized tests

Additional research designs l.jpg
Additional Research Designs with Disabilities

  • Think Aloud Studies or Cognitive Labs

  • Item Timing Studies

  • Scaffolded Accommodations

Argument based validity l.jpg
Argument-Based Validity with Disabilities

  • Clearly Define Construct Assessed

    • Evidence Centered Design

  • Decision Tree