Thoughts on reconciling international comparative studies and national assessments

Thoughts on reconciling international comparative studies and national assessments Harvey Goldstein Centre for Multilevel Modelling Graduate School of Education University of Bristol

A paradox? • An apparent fundamental paradox of comparative international achievement studies: • They strive to achieve cross-country test ‘equivalence’ • Unique country content (items) down-weighted or excluded • This implies less relevance for each individual country compared to a national assessment designed to measure response to a country’s curriculum • This down-weighting can also result from translational problems to which there may beno formal solution • Thus, the more ‘comparable’ the test in the sense of the items performing similarly (difficulty, discrimination) across countries,the less relevant they may become for comparisons in terms of any given country curriculum, and therefore less useful for policy purposes. • It would appear that no formal solution possible – unless we look at the wider context that is exemplified by claims for PISA.

PISA and policy. • OECD suggests that PISA is useful for informing educational policy decisions and that governments (who are funders of OECD) can draw lessons from PISA comparisons. • PISA also states that it seeks to measure performance that is not tied to any particular curriculum, and this is consistent with the item selection procedure. • These two positions are reconcilable if (and only if) the aim of OECD is to move towards a common international curriculum. Encouraging governments to optimise their country’s performance on PISA tests then becomes an indirect attempt to move in this direction. • If successful OECD would indeed become a kind of World Ministry of Education. • In fact it already promotes policies across countries that, it suggests, are based upon PISA results in terms of ‘skills’ etc.

Relating the national to the international • Insofar as national assessments do reflect the diversity of national curricula, how can we further explore the actual relevance of PISA for each country? • Utilise curriculum ‘experts’ to describe the extent of the test relevance to each curriculum to inform comparisons. • Worthwhile if systematic, but can be subjective and contestable • Tends to describe relevance of the items as designed rather that as they perform in practice. • If you could link national (locally relevant) to international assessments you could then sensibly use the information to compare countries in terms of those components which might be comparable.

Another perspective Suppose we took each PISA test item or group of items and deliberately redesigned it specifically to be relevant to a given target country, for example by using a context familiar in that country, so possibly changing its relative difficulty and also in terms of its relationship with other items (discrimination) or variables. Remembering also that there are typically several dimensions underlying each PISA (and national assessment) scale. We could in principle do this experimentally for, say, a test of reading comprehension in PISA. While an interesting experiment it seems unlikely, not least because of the effort and expense involved. Nevertheless, it would throw interesting light on the inherent variability of the item responses that may be imposing ‘random’ over time fluctuations in country scores. Note that this is not the same as taking PISA items to use within a national assessment framework.

Linking methods An alternative to studying items is to utilise existing national assessments by linking to international studies. I look at different models for such linkage then and their relative strengths and benefits. Simplest and most efficient is linking via students: A suitably large subset of students responding to PISA and national assessment. ISSUES Timing differences between testings Numbers of schools and students ADVANTAGES The individual relationships between items and scales can be studied in relation to difficulty and associations with other variables measured at individual and school level. PISA results can be related to assessment s made at same and prior and later ages.

Linking by Schools • We may not be able to link at student level for practical reasons = e.g confidentiality, but could be overcome: • If not, next best is to use some of the same schools in PISA as in national assessments (NA) – straightforward if all schools are in NA. • Relationships can be measured at school level but limited. Thus, overall difficulties can be compared, school rankings compared, but student level relationships cannot be inferred. • What analyses can we use given linking is avalable?

Example I • Standardised PISA reading score (y) related to National Assessment (NA) reading attainment (x) and gender (z) at similar times • - country 1 • - country 2 • So if or non-zero then PISA differentially applicable to boys and girls. • Additional if y, then implies PISA gender differences are not the same after adjusting for each country – could then explore detailed responses and corresponding relationships at subtest level and to explore items to see whether this can be explained in terms of item wording, contexts etc. • We can study other factors, separately or in combination.

Example II • We can also see whether relative amount of between-school or between-student variation is different for NA and PISA. If e.g. between school variation is less for PISA than NA then it implies that the more generic ‘skills’ being measured by PISA show less diversity across the system than the responses to the actual curriculum. • This can then be further studied to see if this is the case for subgroups (gender, ethnic group etc.) These approaches provides a robust empirical approach to the comparability problem.

Some implications for PISA, TIMSS, etc • Full openness of item test pool to researchers and the public so that proper comparisons can be made. This would be welcome anyway since it is just good practice. • Planned coordination of national with international assessments. • Resources to be made available for such analyses • If PISA itself were longitudinal this would be even more interesting, but does require extra resources

Thoughts on reconciling international comparative studies and national assessments

Thoughts on reconciling international comparative studies and national assessments

Presentation Transcript

Thoughts on International e-Science Infrastructure

THE CPA’s PERSPECTIVE ON PENETRATION STUDIES AND VULNERABILITY ASSESSMENTS

International and Comparative Management

Some thoughts on international strategic partnerships

Thoughts on Critical Game Studies

National Assessments

International Accounting Research and Some Thoughts on China

Reconciling Group and International Sunspot Numbers

Reconciling Group and International Sunspot Numbers

Phylogenetic comparative methods Comparative studies (nuisance) Evolutionary studies (objective)

Reconciling Work and Family: An International Perspective

Reconciling objectives at sectoral, national and regional levels

NAEP and International Assessments

Understanding International Assessments

Manitoba Reading Scores on National and International Assessments

Comparative Studies

Predictable Unpredictability: Thoughts on Reconciling Model Results With Actual Experience

Reconciling Group and International Sunspot Numbers

Comparative Studies

Thoughts on Optimization Studies

[PDF] Law and Religion: National, International, and Comparative Perspectives (A