Borrowing the Strength of Unidimensional Scaling to Produce Multidimensional Educational Effectivene...
Download
1 / 45

Presentation at the 12 th Annual Maryland Assessment Conference College Park, MD October 18, 2012 - PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on

Borrowing the Strength of Unidimensional Scaling to Produce Multidimensional Educational Effectiveness Profiles. Presentation at the 12 th Annual Maryland Assessment Conference College Park, MD October 18, 2012 Joseph A. Martineau Ji Zeng Michigan Department of Education. Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Presentation at the 12 th Annual Maryland Assessment Conference College Park, MD October 18, 2012' - jade-estes


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Borrowing the Strength of Unidimensional Scaling to Produce Multidimensional Educational Effectiveness Profiles

Presentation at the 12th Annual

Maryland Assessment Conference

College Park, MD

October 18, 2012

Joseph A. Martineau

Ji Zeng

Michigan Department of Education


Background
Background Multidimensional Educational Effectiveness Profiles

  • Prior research showing that using unidimensional measures of multidimensional achievement constructs can distort value-added

    • Martineau, J. A. (2006). Distorting Value Added: The Use of Longitudinal, Vertically Scaled Student Achievement Data for Value-Added Accountability. Journal of Educational and Behavioral Statistics, 31(1), 35-62.

    • Construct irrelevant variance can become considerable in value-added measures when a construct is multidimensional, but is modeled in value-added as unidimensional.

    • Common misunderstanding is that if the multiple constructs are highly correlated, value-added should not be distorted.

    • Correct understanding is that if value-added on the multiple constructs is highly correlated, value-added should not be distorted


Background1
Background Multidimensional Educational Effectiveness Profiles

  • Prior research showing that the choice of dimension/domain within construct changes value-added significantly

    • Lockwood, J.R et al. (2007). The Sensitivity of Value-Added Teacher Effect Estimates to Different Mathematics Achievement Measures. Journal of Educational Measurement, 44(1), 47-67.

    • Depending on choices made in value-added modeling, the correlation between teacher value-added on Procedures and Problem Solving ranged from 0.01 to 0.46.

    • This gives a surprisingly low correlation in value-added that indicates that at least in this situation, one needs to be concerned about modeling value-added in both dimensions rather than unidimensionally.

    • Only work I am aware of to date that has inspected inter-construct value-added correlations.


Background2
Background Multidimensional Educational Effectiveness Profiles

  • Prior research showing that commonly used factor analytic techniques underestimate the number of dimensions in a multidimensional construct

    • Zeng, J. (2010) . Development of a Hybrid Method for Dimensionality Identification Incorporating an Angle-Based Approach. Unpublished doctoral dissertation, University of Michigan.

    • Common dimensionality identifications procedures make the unwarranted assumption that all shared variance among indicator variables arise because the indicator variables measure the same construct (shared variance can also arise because the indicator variables are influenced by a common exogenous variable)

    • Because of this unwarranted assumption, commonly used dimensionality identification techniques underestimate the number of dimensions in a data set.


Background3
Background Multidimensional Educational Effectiveness Profiles

  • Scaling constructs as multidimensional is a difficult task

    • Multidimensional Item Response Theory (MIRT) is time-consuming and costly to run

    • Replicating MIRT analyses can be challenging (there are multiple subjective decision points along the way)

    • Identifying the number of dimensions in MIRT can be challenging

    • Once the number of dimensions is identified, identifying which items load in which dimensions in MIRT can also be challenging

      • The factor analysis techniques underlying MIRT are techniques for data reduction, not dimension identification


Background4
Background Multidimensional Educational Effectiveness Profiles

  • Short of resolving the considerable difficulties in analytically identifying dimensions within a construct (and replicating such analyses), can another approach be used?

  • Propose using/trusting content experts’ identifications of dimensions within constructs (e.g., the divisions agreed upon by the writers of content standards) as the best currently available identification of dimensions, for example…

    • Within English language proficiency, producing reading, writing, listening, and speaking scales.

    • Within Mathematics, producing number & operations, algebra, geometry, measurement, and data analysis/statistics scales.


Background5
Background Multidimensional Educational Effectiveness Profiles

  • However, separately scaling each dimension can also be difficult and costly compared to running a traditional unidimensional IRT calibration

    • Confirmatory MIRT

    • Bi-factor IRT model

    • Separate unidimensional calibration and year-to-year equating of each dimension score

  • Another option:

    • Unidimensionally calibrate the total score

    • Unidimensionally equate the total score from year to year

    • Use (fixed) item parameters from the unidimensional calibration to create the multiple dimension scores as specified by content experts

    • Use of this method needs to be investigated

  • Practical necessity for Smarter Balanced Assessment Consortium


Purpose
Purpose Multidimensional Educational Effectiveness Profiles

  • Investigate the feasibility and validity of relying on unidimensional total score calibration as a basis for creating multidimensional profile scores…

    • For reporting multidimensional student achievement scores

    • For reporting multidimensional value-added measures

  • Investigate the impact of separate versus fixed calibration of multidimensional achievement scores in terms of impact on…

    • Student achievement scores

    • Value-added scores

  • …as compared to the impact of other common decisions in scaling, outcome selection, and value-added modeling


Methods
Methods Multidimensional Educational Effectiveness Profiles

  • Decisions Modeled in the Analyses

    • Psychometric decisions

      • Choice of psychometric model

        • 1-PL vs. 3-PL

        • PCM vs. GPCM

      • Estimation of sub-scores

        • Separate calibration for each dimension vs. fixed calibration based on unidimensional parameters

    • Choice of outcome metric

      • Which sub-score is modeled

    • Value-added modeling decisions

      • Inclusion of demographics in models

      • Number of pre-test covariates (for covariate adjustment models)


Methods1
Methods Multidimensional Educational Effectiveness Profiles

  • Outcomes

    • Correlations in student achievement metrics compared across each psychometric choice and outcome choice

    • Correlations in value-added modeling compared across each choice

    • Classification consistency in value-added compared across each choice for

      • Three-category classification decisions

        • Based on confidence intervals around point-estimates placing programs/schools into three categories: (1) above average, (2) statistically indistinguishable from the average, and (3) below average

      • Four-category classification decisions

        • Based on sorting programs’/schools’ point estimates into quartiles, representing arbitrary cut points for classification


Methods2
Methods Multidimensional Educational Effectiveness Profiles

  • Data

    • Michigan English Language Proficiency Assessment (ELPA)

    • Level III (Grades 3-5)

    • 3391 students each with 3 measurement occasions (10,173 total scores)

    • Measures

      • Total

      • Reading (domain)

      • Writing (domain)

      • Listening (domain)

      • Speaking (domain)

    • Calibrated the ELPA as a unidimensional measure using both 1-PL/Partial Credit Model and 3-PL/Generalized Partial Credit Model

    • Created domain scores both from fixed parameters from unidimensional calibration and in separate calibrations for each domain


Methods3
Methods Multidimensional Educational Effectiveness Profiles

  • Data

    • Michigan Educational Assessment Program (MEAP) Mathematics

    • Grades 7 and 8 (not on a vertical scale)

    • Over 110,000 students per grade

    • Measures

      • Total (using items from the two domains)

      • Number & Operations (domain)

      • Algebra (domain)

    • Calibrated the MEAP Math tests as unidimensional measures using both 1-PL and 3-PL models

    • Created domain scores both from fixed parameters from unidimensional calibration and in separate calibrations for each domain


Methods4
Methods Multidimensional Educational Effectiveness Profiles

  • Value-added modeling the ELPA

    • 3-level HLM nesting test occasion within student within English language learner program to obtain program value-added


Methods5
Methods Multidimensional Educational Effectiveness Profiles

  • Value-added modeling the ELPA

    • VAMs were run in a fully-crossed design with…

      • All outcomes (R, W, L, S)

      • PCM- and GPCM-calibrated outcomes

      • Fixed and separately calibrated outcomes

      • With and without demographics in the VAMs

    • 32 real-data applications across design factors


Methods6
Methods Multidimensional Educational Effectiveness Profiles

  • Value-added modeling MEAP mathematics

    • 2-level HLM covarying grade-8 outcomes on grade-7 outcomes with students nested within schools


Methods7
Methods Multidimensional Educational Effectiveness Profiles

  • Value-added modeling MEAP mathematics

    • VAMs were run in a fully-crossed design with…

      • Both outcomes (algebra and number & operations)

      • 1-PL and 3-PL calibrated outcomes

      • Fixed and separately calibrated outcomes

      • With and without demographics

      • With either one or two pre-test covariates

    • 32 real-data applications across design factors


Results
Results Multidimensional Educational Effectiveness Profiles

ELPA


Results elpa student level outcomes
Results: ELPA Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations across fixed vs. separate calibrations


Results elpa student level outcomes1
Results: ELPA Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations across model choice (PCM vs. GPCM)


Results elpa student level outcomes2
Results: ELPA Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations across content areas

Low to moderate inter-dimension correlations

However, Rasch dimensionality analysis from WINSTEPS identified the total score as a unidimensional score


Results elpa program district level value added outcomes
Results: Multidimensional Educational Effectiveness ProfilesELPA Program District-Level Value-Added Outcomes

  • Impact of fixed versus separate calibration


Results elpa program district level value added outcomes1
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations between Listening and Reading VA

  • Min = 0.228, Max = 0.397

  • Mean = 0.322, SD = 0.037


Results elpa program district level value added outcomes2
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations between Listening and Writing VA

  • Min = 0.342, Max = 0.420

  • Mean = 0.373, SD = 0.019


Results elpa program district level value added outcomes3
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations between Listening and Speaking VA

  • Min = -0.005, Max = 0.108

  • Mean = 0.046, SD = 0.035


Results elpa program district level value added outcomes4
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations between Reading and Writing VA

  • Min = 0.335, Max = 0.491

  • Mean = 0.412, SD = 0.047


Results elpa program district level value added outcomes5
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations between Reading and Speaking VA

  • Min = 0.121, Max = 0.205

  • Mean = 0.151, SD = 0.026


Results elpa program district level value added outcomes6
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations between Speaking and Writing VA

  • Min = 0.150, Max = 0.246

  • Mean = 0.199, SD = 0.029


Results elpa program district level value added outcomes7
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of choice of psychometric model


Results elpa program district level value added outcomes8
Results: ELPA Program District-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of Including/Not Including Demographics


Results1
Results Multidimensional Educational Effectiveness Profiles

MEAP Mathematics


Results meap math student level outcomes
Results: MEAP Math Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Correlations among variables based on psychometric decisions


Results meap math student level outcomes1
Results: MEAP Math Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Very high correlations based on fixed versus separate calibrations


Results meap math student level outcomes2
Results: MEAP Math Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Very high correlations based on fixed versus separate calibrations


Results meap math student level outcomes3
Results: MEAP Math Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Not as high correlations based on 1-PL versus 3-PL calibrations


Results meap math student level outcomes4
Results: MEAP Math Student-Level Outcomes Multidimensional Educational Effectiveness Profiles

  • Moderate to high correlations across dimensions


Results meap mathematics school level value added outcomes
Results: MEAP Mathematics School-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of fixed versus separate calibration


Results meap mathematics school level value added outcomes1
Results: MEAP Mathematics School-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of choice of outcome (Algebra vs. Number)


Results meap mathematics school level value added outcomes2
Results: MEAP Mathematics School-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of choice of psychometric model


Results meap mathematics school level value added outcomes3
Results: MEAP Mathematics School-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of Including/Not Including Demographics


Results meap mathematics school level value added outcomes4
Results: MEAP Mathematics School-Level Value-Added Outcomes Multidimensional Educational Effectiveness Profiles

  • Impact of covarying on one vs. two pre-test scores


Conclusions
Conclusions Multidimensional Educational Effectiveness Profiles

  • Practically important impacts on value-added metrics and value-added classifications

    • Choice of psychometric model

    • Including/not including demographics

    • Including/not including multiple pre-test values

  • Prohibitive impacts on value-added metrics and value-added classifications

    • Choice of outcome (i.e., domain within construct)

  • Practically negligible impacts on value-added metrics and value-added classifications

    • Separate versus fixed calibrations of domains within construct


Conclusions continued
Conclusions, continued… Multidimensional Educational Effectiveness Profiles

  • Need to pay attention to modeling domains within constructs if constructs can reasonably be considered multidimensional

    • Of the common psychometric and statistical modeling decisions one can make, the choice of which subscore to use as an outcome is the most influential

    • Because subscores give different profiles of both student achievement and program/school value-added, each subscore should be modeled to the degree possible

  • 4-category (i.e., quartile) classifications on value-added are appreciably impacted by every psychometric and statistical modeling choice evaluated here, but 3-category classifications are not

    • Discourage more than three categories

    • RTTT requires at least four categories


Conclusions continued1
Conclusions, continued… Multidimensional Educational Effectiveness Profiles

  • 3- vs. 4-category distinction is actually a proxy for

    • Statistical decision categories (3-categories)

    • Arbitrary cut point categories (4-categories)

  • Can leverage unidimensional calibrations of multidimensional achievement scales to create multidimensional profiles of value-added

    • Except where using four categories of classifications


Limitations
Limitations Multidimensional Educational Effectiveness Profiles

  • Inductive reasoning

    • Results are likely to hold in similar circumstances

    • Still will need to investigate feasibility of using fixed parameters from unidimensional calibration for specific circumstances if those circumstances are high stakes

    • This is a proof of concept

  • PCM and GPCM models were run using different software (WINSTEPS vs. PARSCALE)


Contact information
Contact Information Multidimensional Educational Effectiveness Profiles

  • Joseph A. Martineau, Ph.D.

    • Executive Director

    • Bureau of Assessment & Accountability

    • Michigan Department of Education

    • [email protected]

  • Ji Zeng, Ph.D.

    • Psychometrician

    • Bureau of Assessment & Accountability

    • Michigan Department of Education

    • [email protected]


ad