slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007 PowerPoint Presentation
Download Presentation
Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

Loading in 2 Seconds...

play fullscreen
1 / 40

Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007 - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Are We All On the Same Page? An Exploratory Study of OPI Ratings Across NATO Countries Using the NATO STANAG 6001 Scale*. Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Julie J. Dubeau Canadian Defence Academy BILC CONFERENCE SAN ANTONIO, TEXAS, May 20-24 2007' - nairi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Are We All On the Same Page?An Exploratory Study of OPI RatingsAcross NATO CountriesUsing the NATO STANAG 6001 Scale*

Julie J. Dubeau

Canadian Defence Academy

BILC CONFERENCE

SAN ANTONIO, TEXAS, May 20-24 2007

*This research was conducted as an MA Thesis, Carleton University, September 06

presentation outline
Presentation Outline
  • Context
  • Research Questions
  • Literature Review
  • Methodology
  • Results
    • Ratings
    • Raters
    • Scale use
  • Conclusion
nato language testing context
NATO Language Testing Context
  • Standardized Language Profile (SLP) based on the NATO STANDARDIZATION AGREEMENT (NATO STANAG) 6001 Language Proficiency Levels
    • 26 NATO countries, 20 Partnership for Peace (PfP) countries
    • Interoperability is essential
research questions
Research Questions
  • The overarching research question was:

How comparable or consistent are ratings across NATO raters and countries?

research questions1
Research Questions
  • Research questions pertaining to the ratings (RQ1)
  • Research questions pertaining raters’ training and background (RQ2)
  • Research questions pertaining to the rating process and to the scale (RQ3)
literature review
Literature Review
  • Testing Constructs
    • What are we testing?
  • Rater Variance
    • How do raters vary?
methodology
Methodology
  • Design of study : Exploratory survey
  • Participants : Recruited at Sofia BILC 05
    • 103 raters from 18 countries and 2 NATO units
    • Control group
methodology1
Methodology
  • Instrumentation & Procedure & Analysis
    • Rater data questionnaire
    • 2 Oral Proficiency Interviews (OPIs) A & B
    • Questionnaire accompanying each sample OPI
methodology2
Methodology
  • Analysis
    • Rating comparisons
        • Original ratings
        • ‘Plus’ ratings
    • Rater comparisons
        • Training
        • Background
methodology3
Methodology
  • Country to country comparisons
    • Within country dispersion
  • Rating process
    • Rating factors
  • Rater/scale interaction
    • Scale user-friendliness
results rq1 summary
Results RQ1- Summary
  • Ratings : To compare OPIs ratings in NATO countries, and to explore the efficacy of ‘plus levels’ or plus ratings.
    • Some rater-to-rater differences
    • ‘Plus’ levels brought ratings closer to the mean
    • Some country-to-country differences
    • Greater ‘within-country’ dispersion
    • Low correlation between samples A & B
view of opi ratings sample a
View of OPI ratings sample A

Adjusted scores with ‘pluses’

60

Within L1 range

Within L2 range

Within L3 range

50

40

Count

60

30

32

20

10

10

1

0

within level 1

within level 2

within level 3

Stacked view of A

all countries means for sample a
All Countries’ Means for Sample A

19

20

18

17

16

15

Country numbers

14

13

12

11

10

9

8

7

5

6

4

3

2

1

1.00

1.20

1.40

1.60

1.80

2.00

2.20

2.40

Overall Country Mean

samples a b
Samples A & B
  • A Spearman rank-order correlation coefficient ρ = .57
  • A Pearson product-moment correlation coefficient r = .55

= low statistical correlations between the two sets of data (Samples A & B)

= no consistency from raters

results rq2 summary
Results RQ2- Summary
  • Raters: To investigate rater training and scale training and see how (or if) they impacted the ratings, and to explore how various background characteristics impacted the ratings
    • Trained raters scored within the mean, especially for sample B
    • Experienced raters did not do as well as scale-trained raters
    • Full-time raters closer to mean
    • ‘New’ NATO raters closer to mean
    • No difference in ratings btwn NS & NNS raters
slide21

Tester (Rater) Training

70

60

50

40

Frequency

63.27%

30

20

36.73%

10

0

none to little

substantial to lots

rating b and tester training crosstabulation
Rating B and Tester Training Crosstabulation

Summary of Tester Trg

Total

Little

Lots

Score B correct? Yes

No

Missing

Total

14

20

2

36

14

20

2

36

44

14

4

62

58

34

6

98

slide23

STANAG Scale Training

60

50

40

Percent

60.0%

30

40.0%

20

10

0

none to little

substantial to lots

rating b and stanag training crosstabulation
Rating B and STANAG Training Crosstabulation

Summary of STANAG Trg

Total

Little

Lots

Rating B correct? Yes

No

Missing

Total

14

20

2

36

28

24

5

57

29

8

1

38

57

32

6

95

slide25

Years Experience

50

40

30

Frequency

49.5%

20

19.8%

10

15.84%

14.85%

0

0 to 1 year

2 to 3 years

4 to 5 years

5 years +

rating b and 4 yrs experience crosstabulation
Rating B and 4 Yrs Experience Crosstabulation

Experience

Total

3 yrs or less

4 yrs or more

Rating B correct? Yes

No

Missing

Total

14

20

2

36

26

6

3

35

34

29

3

66

60

35

6

101

results raters background
Results Raters’ Background
  • Work in Testing Full-time?
      • Yes 34 (33.0 %)
      • No 67 (65.0 %)
      • Full-time testers more reliable
    • 60% were NNS
    • 53% were from ‘older’ NATO countries
old new nato countries
‘Old’ & ‘New’ NATO Countries

Rating B Correct?

Total

Yes

No

Other/Missing

New NATO? Yes

No

Total

14

20

2

36

27

27

54

6

26

32

4

2

6

37

55

92

old new nato countries1
‘Old’ & ‘New’ NATO Countries

Summary of Tester Trg

Total

Little

Lots

New NATO?Yes

No

Total

14

20

2

36

6

23

29

30

28

58

36

51

87

results rq3 summary
Results RQ3- Summary
  • Scale: To explore the ways in which raters used the various STANAG statements and rating factors to arrive at their ratings.
    • Rating process did not affect ratings significantly
    • Rating factors not equal everywhere
    • 3 main ‘types’ of raters emerged:
      • Evidence-based
      • Intuitive
      • Extra-contextual
results
Results
  • An ‘evidenced-based’ rating for Sample B (level 2):

This candidate’s performance cannot be rated as 2+. Grammatical/structural control is inadequate and does not rise above (even occasionally) into the upper level. Mispronunciation detracts from the delivery and can be problematic. No evidence of well-controlled but extended discourse. No clear evidence of the use of even some complex structures that might raise the performance to the + level. Finally, there is no evidence that the performance rises and crosses into level 3. (Rater 36)

results1
Results
  • An ‘intuitive’ rating for Sample A (level 1):

I would say that just about every single sentence in the interpretation of the level 2 speaking could be applied to this man. And because of that I would say that he is literally at the top of level 2. He is on the verge of level 3 literally. So I would automatically up him to a low 3. (Rater 1)

results2
Results
  • An ‘extra-contextual’ rating for Sample A (level 1):

I wouldn’t give him a 2 plus but I would give him a 3 minus. I have to admit that I am basing that decision on the fact that by demonstrating he is a high 2 in every single aspect of the description of a level 2, I would give him a sort of vote of confidence that in any job abroad he might have a hard time at first but I think he could handle really working in the language. (Rater 1)

results3
Results
  • An ‘extra-contextual’ rating for Sample A (level 1):

Yes! I would be happy to give him a 1+. Since we do not use ‘plus levels’ I am afraid that rating him as a clear 1 would disadvantage him and, for this reason, I would rather give him a very low 2. (Rater 20)

results4
Results
  • An ‘extra-contextual’ rating for Sample A (level 1):

I got to question 7 and re-read the STANAG document and now I think ‘2’ is more appropriate.(Rater 95)

***

Level 3 is the basic level needed for officers in (my country). I think the candidate could perform the tasks required of him. He could easily be bulldozed by native speakers in a meeting, but would hold his own with non-native speakers. He makes mistakes that very rarely distort meaning and are rarely disturbing. (Rater 95)

results5
Results
  • Control group:
    • Comparable ratings to lesser trained group of participants
    • Evidence-based ratings
implications
Implications
  • Plus levels beneficial
  • Training uneven
  • Frequent re-training
  • Different grids
  • Institutional perspectives
limitations future research
Limitations & Future Research
  • OPIs new to some participants
  • Future research could:
    • Get participants to test
    • Investigate rating grids
    • Look at other skills
conclusion
Conclusion

So, are we all on the same page?

YES! BUT…

  • Plus levels were instrumental in bridging gap
  • Training was found to be key to reliability
  • More in-country norming should be the first step toward international benchmarking
thank you questions

Thank You! Questions?

Are We All On the Same Page?An Exploratory Study of OPI RatingsAcross NATO CountriesUsing the NATO STANAG 6001 Scale

Julie J. Dubeau

Dubeau.JJ@forces.gc.ca

The full thesis is available on the CDA website

http://cda.mil.ca/dpd/engraph/services/lang/lang_e.asp

(A condensed article is also forthcoming)