Large-scale testing: Uses and abuses - PowerPoint PPT Presentation

richardpphelps
large scale testing uses and abuses n.
Skip this Video
Loading SlideShow in 5 Seconds..
Large-scale testing: Uses and abuses PowerPoint Presentation
Download Presentation
Large-scale testing: Uses and abuses

play fullscreen
1 / 57
Download Presentation
Large-scale testing: Uses and abuses
99 Views
Download Presentation

Large-scale testing: Uses and abuses

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014

  2. Large-scale testing: Uses and abuses Three types of large-scale tests Judging test quality A chronology of mistakes Economists misunderstand testing How SIMCE is affected

  3. 1. Three types of testsAchievementAptitudeNon-cognitive

  4. Achievement tests Historically, were larger versions of classroom tests ~ 1900 - “scientific” achievement tests developed (Germany & USA) J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales SOURCE: Phelps, Standardized Testing Primer, 2007

  5. Achievement tests Purpose: to measure how much you know and can recall Developed using: content coverage analysis How validated: retrospective or concurrent validity (correlation with past measures, such as high school grades) Requires a mastery of content prior to test. Fairness assumes that all have same opportunity to learn content Coachable – specific content is known in advance SOURCE: Phelps, Standardized Testing Primer, 2007

  6. Aptitude tests 1890s – A. Binet & T. Simon (France) - Worked with pre-school children with mental disabilities - an achievement test was not possible - developed content-free test of mental abilities (association, attention, memory, motor skills, reasoning) 1917 – Adapted by U.S. Army to select, assign soldiers in World War 1 1930s – Harvard University president J. Conant wanted new admission test that to identify students from lower social classes with the potential to succeed at Harvard Developed the first Scholastic Aptitude Test (SAT) SOURCE: Phelps, Standardized Testing Primer, 2007

  7. Aptitude tests Purpose: predict how much can be learned Developed using: skills/job analysis How validated: predictive validity, correlation with future activity (e.g., university or job evaluations) Content independent. Measures: … what student does with content provided … how student applies skills & abilities developed over a lifetime Not easily coachable – the content is either… … not known in advance, … basic, broad, commonly known by all, curriculum-free; … less dependent on the quality of schools SOURCE: Phelps, Standardized Testing Primer, 2007

  8. Aptitude tests Aptitude tests can identify: students who are bored in school but study what interests them on their own students not well adapted to high school, but well adapted to university students of high ability stuck in poor schools

  9. Comparing Achievement & Aptitude tests

  10. Non-cognitive tests More recently developed – measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment “fit”

  11. Non-cognitive tests Purpose: to identify “fit” with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned “Faking” can be an issue (e.g., “honesty” tests)

  12. Comparing Achievement, Aptitude, & Non-Cognitive Tests

  13. 2. Judging test quality Test reports can be “data dumps” 3 measures, in particular, are important: 1. Predictive validity 2. Content coverage 3. Sub-group differences

  14. Predictive validity(values from -1.0 to +1.0) …measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides a university little information.

  15. A positive correlation between two measures Source: NIST, Engineering Statistics Handbook

  16. A negative correlation between two measures Source: NIST, Engineering Statistics Handbook

  17. No correlation between two measures Source: NIST, Engineering Statistics Handbook

  18. ¿Cómo se mide la capacidad predictiva?Coeficiente de correlación: I--------------------------------------------I-1 0 1

  19. Predictive validities: SAT and PSU SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

  20. Predictive validities: SAT and PSU (faculty: Administracion) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

  21. Predictive validities: SAT and PSU (faculty: Arquitectura) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

  22. Predictive validities: SAT and PSU (faculty: Educacion) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013

  23. Predictive validities of the PSU (CTA v Pearson estimates) SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; CTA

  24. Incremental Predictive validities: (PAA + PCEs v PSU) SOURCE: S.A. Prado, Estudio de ValidezPredictiva de la PSU y Comparacion con el Sistema PAA, Universidad de Chile

  25. Content coverage(values from 0% to 100%) …how much of the content domain of a test has been taught in the schools. It is not fair to expect students to master content to which they have not been exposed. …or, to compare students who have been exposed to others who have not.

  26. 2 problems: There are 2 curricula and PSU covers only one

  27. Content coverage charts

  28. Subgroup differences Differences in test scores among subgroups (e.g., gender, ethnic, school type) should be due only to differences in the attribute measured by the test and not to systematic biases in the test.

  29. 3. A chronology of mistakes

  30. 3. A chronology of mistakes 2000 (Comision Nuevo Curriculum de la Ensenanza Media y Pruebas del Sistema de Admision a la Educacion Superior SOURCE: InformeSometido en ConsultaPrevia a la Ministra de Educacion, Novembre 2000

  31. Something on Atkinson?

  32. A chronology of mistakes (cont.) 2001 (World Bank & MINEDUC) …the Academic Aptitude Test for entry to the university system is under revision, together with the universities belonging to the Council of Rectors. This instrument of entry selection, needs also to be aligned with the new curriculum and may become an exit exam from the secondary education system. SOURCE: World Bank,Implementation Completion Report on a Loan in the Amount of $35 million to the Republic of Chile for Secondary Education, 2001

  33. A chronology of mistakes (cont.) 2005 (World Bank) …The new law adopted in May 2005 (Bulletin 3223-04) established a system of student loans available to all students achieving a threshold score in the University Admission Exam (PSU). …the new system does not impede students unable to provide collateral from financing their studies. The new system promises to improve equity further by increasing options for talented students from non-affluent families to access higher education. SOURCE: IMPLEMENTATION COMPLETION REPORT (TF-25378 SCL-44040 PPFB-P3360) ON A LOAN IN THE AMOUNT OF US$145.45 MILLION TO THE REPUBLIC OF CHILE FOR THE HIGHER EDUCATION IMPROVEMENT PROJECT, December 2005

  34. A chronology of mistakes (cont.) 2010 (World Bank) Over time the government should consider replacing the university entry exam with a national school leaving exam as the prime criterion for entry into tertiary education institutions. This could establish a closer link between test results and the school that is responsible for them, making it easier to reach the goal that has been pursued with the introduction of the PSU. There is evidence that central curriculum based exit exams are strongly and positively related to student academic performance (Wößmann, 2005; Bishop, 2006). To allow students to show in more detail their knowledge and their ability to apply it, the school exit exam could be a bit more in-depth than the multiple-choice PSU, including verbal and nonverbal reasoning. SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN; ECONOMICS DEPARTMENT WORKING PAPERS No. 784

  35. A chronology of mistakes (cont.) 2010 (World Bank) The Catholic University of Chile and some partners have recently designed a complementary university entry exam and first evaluations revealed that this has the potential to reduce the socio-economic gap in university admission, while being a good predictor for later success at the university (Santelices, 2009). This suggests that it could be possible to develop adequate exams that make access to university easier for more disadvantaged children. SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN; ECONOMICS DEPARTMENT WORKING PAPERS No. 784

  36. 3. Economists misunderstand testing

  37. Testing & Measurement PhD program (University of Massachusetts, USA, 2013-2014) EDUC 501 Classroom Assessment EDUC 553 Construction, Validation, and Uses of Criterion-Referenced Tests EDUC 555 Introduction to Statistics & Computer Analysis I EDUC 632 Principles of Educational & Psychological Testing EDUC 637 Non-Parametric Statistics Analysis EDUC 656 Introduction to Statistical & Computer Analysis II EDUC 661 Educational Research Methods I EDUC 727 Scale and Instrument Development EDUC 731 Structural Equation Modeling EDUC 735 Advanced Theory & Practice of Testing I EDUC 736 Advanced Theory & Practice of Testing II EDUC 771 Application of Applied Multivariate Statistics I EDUC 772 Application of Applied Multivariate Statistics II EDUC 821 Advanced Validity Theory & Test Validation

  38. What economists do not seem to understand about testing - 1 Increasing an admission test’s correlation with high school work can decrease its correlation with university work

  39. What economists do not seem to understand about testing - 2 Incentives aren’t all matter in improving efficiency; also important: better information, classification, & allocation

  40. What economists do not seem to understand about testing - 3 Incentives generally work best when applied to the actor responsible for the target behavior – currently, students bear the consequences when schools do not teach the curriculum tested on the PSU

  41. What economists do not seem to understand about testing - 4 Many useful and successful tests serve multiple purposes. But, some purposes are compatible and some are not. The PSU has been expected to: Measure the implementation of a new curriculum; Incentivize high schools to implement the new curriculum; Incentivize high school students to study more; Predict success in university; ….