1 / 43

NATO BAT Testing: The First 200

NATO BAT Testing: The First 200. BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL. This Report. History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT in 4-Skills BAT Scores Comparing National Scores to Benchmark Scores Observations.

chenb
Download Presentation

NATO BAT Testing: The First 200

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NATO BAT Testing: The First 200 BILC Professional Seminar 6 October, 2009 Copenhagen, Denmark Dr. Elvira Swender, ACTFL

  2. This Report History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT in 4-Skills BAT Scores Comparing National Scores to Benchmark Scores Observations

  3. This Report History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT Combined BAT Scores Comparing National Scores to Benchmark Scores Observations

  4. Why Benchmark Testing? • To provide an external measure against which nations can compare their national STANAG test results • To promote relative parity of scale interpretation and application across national testing programs • To standardize what is tested and how it is tested

  5. BAT History • Launched as a volunteer, collaborative project • The BILC Test Working Group • 13 members from 8 nations • Contributions received from many other nations • The original goal was to develop a Reading test • Later awarded a competitive contract by ACT • December, 2006

  6. BAT History (cont’d) • ACTFL working with BILC Working Group • To develop tests in 4 skill modalities. • Reading and Listening tests piloted and validated • Speaking and Writing tests developed • Testers and raters trained and certified • Test administration and reporting protocols developed • 200 BAT 4-skills tests allocated under the contract • Tests administered and rated • Scores reported to Nations

  7. BAT Reading and Listening Tests Internet-delivered and computer scored Criterion-referenced tests Allow for direct application of the STANAG Proficiency Scale Each proficiency level is tested separately Test takers take all items for Levels 1,2,3 20 texts at each level; one item with multiple choice responses per text The proficiency rating is assigned based on two separate scores “Floor” – sustained ability across a range of tasks and contexts specific to one level “Ceiling” – non-sustained ability at the next higher proficiency level

  8. BAT Speaking Test • Telephonic Oral Proficiency Interview • Goal is to a produce a speech sample that best demonstrates the speaker’s highest level of spoken language ability across the tasks and contexts for the level • Interview consists of • Standardized structure of “level checks” and “probes” • NATO specific role-play situation • Conducted and rated by one certified BAT-S Tester • Independently second rated by a separate certified tester or rater • Ratings must agree exactly • Level and plus level scores are assigned • Discrepancies are arbitrated

  9. BAT Writing Test • Internet-delivered • Open constructed response • Four, multi-level, prompts • Prompts target tasks and contexts of STANAG levels 1,2,3 • NATO specific prompt • Rated by a minimum of two certified BAT-W Raters • Ratings must agree exactly • Level and plus level scores are assigned • Discrepancies are arbitrated

  10. This Report History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT battery Combined BAT Scores Comparing National Scores to Benchmark Scores Observations

  11. 2009 BAT Administration Allocation to 11 Nations 8 Nations have completed testing Testing began in May, 2009 Tests administered by LTI, the ACTFL Testing Office

  12. 2009 BAT Administration • Each Nation has a customized client site • Request tests • View and print test schedules • Obtain test administration instructions, passwords, and test codes • Retrieve Ratings

  13. ]

  14. This Report History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT Combined BAT Scores Comparing National Scores to Benchmark Scores Observations

  15. Total Number of BAT Scores

  16. BAT Scores by Level Cumulative

  17. This Report History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT Combined BAT Scores Comparing National Scores to Benchmark Scores Observations

  18. 40% 29% – – (5) (7) – – 64% 56% 92% 39% (11) (18) (13) (18) 89% 83% 83% 50% (18) (18) (18) (18) 85% 47% 55% 60% (20) (19) (20) (20) 69% 47% 64% 50% (16) (15) (14) (18) 8% – 54% – (12) – (13) – 24% 0% 33% 0% (17) (18) (18) (18) Alignment of National Scores and BAT Scores Listening Listening Speaking Speaking Reading Reading Writing Writing Black White Red Blue Maroon Purple Yellow

  19. This Report History of Benchmark Advisory Tests (BAT) 2009 Administration of BAT Combined BAT Scores Comparing National Scores to Benchmark Scores Observations

  20. Observations – Listening Scores Exact agreement of BAT and National Scores is 58% 69 of the 119 Listening scores agree exactly When the scores disagree, the National score is HIGHER 88% of the time In 8 cases (7%), disagreement is across two levels 1 vs 3 and 2 vs 4

  21. Observations – Speaking Scores Exact agreement of BAT and National Scores is 46% 53 of 115 Speaking scores agree exactly When the scores disagree, the National score is HIGHER in all cases In 6 cases (6%),the disagreement is across two levels 1 vs 3 and 2 vs 4

  22. Observations – Reading Scores Exact agreement of BAT and National Scores is 62% 74 of 119 Reading scores agree exactly When the scores disagree, the National score is HIGHERin 85% of the cases In 2 cases, the disagreement is across two levels 1 vs 3

  23. Observations – Writing Scores Exact agreement of BAT and National Scores is 38% 44 of 115 Writing scores agree exactly When there is disagreement, the National score is HIGHER in all cases In 15 cases, the disagreement is across two levels 1 vs 3 and 2 vs 4

  24. Accounting for Strictness or Leniency • Testing rehearsed rather than unrehearsed material • Performance vs proficiency • Inconsistencies in interpretation of the STANAG • When “plus” ratings are not used, the tendency to award the next higher level rating to a performance that is substantially better than a baseline performance

  25. For Receptive Skills • Compensatory cut score setting • Lack of alignment of author purpose, text type, and reader task at level • Inadequate item response alternatives

  26. For Productive Skills • Misalignment of test type and test purpose • Ex: list of discrete questions when goal is to measure spoken language proficiency • Inadequate tester/rater norming

  27. Plus Ratings Within the Level 1 Range 60% of ratings are 1 40% of ratings are 1+ Within the Level 2 Range 50% of ratings are 2 50% of ratings are 2+

  28. Profiles • Only 12 of 115 profiles (10%) were “flat” • 1 1 1 1 (8) • 2 2 2 2 (2) • 3 3 3 3 (2) • All remaining profiles are mixed

  29. We are all wondering. What will the future bring?

  30. Let’s hope it’s not the same kind of anxiety these early linguists experienced.

  31. Questions?

  32. Extra Slides

  33. Side-by-side BAT and National Test Scores

  34. BAT Scores by Level Reading

  35. BAT Scores by LevelListening

  36. BAT Scores by LevelSpeaking

  37. BAT Scores by LevelWriting

  38. Comparing Scores by Level Reading

  39. Comparing Scores by Level Listening

  40. Comparing Scores by Level Speaking

  41. Comparing Scores by Level Writing

  42. Alignment of National Scores and BAT Scores

More Related