Using the IRT and Many-Facet Rasch Analysis for Test Improvement Desislava Dimitrova, Dimitar Atanasov New Bulgarian University “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” BILC Seminar, 10-15 October 2010-Varna
Outline • Examination procedure • Main concepts and observations • Socio-cognitive test validation framework, Cyril Weir (2005) and criteria • Scoring validity for listening and reading parts of the test • Scoring validity for essay
Test structure 1. Listening paper: two tasks • 15 MCQ 2. Reading paper: five tasks • 6 items matching response format • 10 items bank-cloze response format • 10 items open-cloze response format • 16 items short-answer response format • 2 open-ended questions • 5 MCQ 3. Essay: 180-220 words
Too much? • The concept of communicative language ability (CEFR) • The concept of test usefulness (Bachman) • The concept of justifing the use of language assessment in real world (Bachman) • The concept of validity • The Code of practice (ALTE*, for example) *Association of Language Testers in Europe
Statements NBU exam is high-stake. NBU exam is criterion-oriented. NBU exam is ‘independent’. Evidences for test validation were not established, BUT there was a routine practice for test development process and test administration.
The Socio-cognitive Framework for test validation, Cyril Weir (2005) Test takers characteristics and: Context validity Theory-based validity Scoring validity Consequential validity Criterion-related validity
“Before-the –test- event” Context validity Theory-based validity “After- the- test –event” Scoring validity Consequential validity Criterion-related validity
Scoring validity for listening and reading parts of the test are established by: • Item analysis • Internal consistency • Error of measurement • Marker reliability Not just looking at them! Investigate, discuss, learn and take decisions!
Analisis 3-parameter IRT model Advantages • Item parameter estimates are independent of the group of examinees used • Test taker ability estimates are independent of the particular set of items used Degree of Difficulty to specify the discrimination to specify the content
Possible decisions • Remedial procedures • Classroom assessment • Only certification decision
Scoring validity for writing is established by: • Criteria/rating scale • Rating procedures: Rater training Standardization Rating conditions Rating Moderation Statistical analysis Raters • Grading
Good Two raters Analytic writing scale Rubrics and input Negative The score depends on the raters No task specific scale No standardization Conclusion for the essay:
Now is fact that: We will continue our work for • item writer’s training • content and statistical specification of the items • test review and test revision
Shearing: Investigation (small steps to “strong” validity). Comparison (language ability of the same population at the same level) Cooperation (in research project)
Thank you New Bulgarian University www.nbu.bg