Chapter 5

Chapter 5 Suggestions for Using Information- Exchange Tasks for Oral Testing

In this chapter we explore: • Four general criteria for designing language tests that can be applied to the design of oral test • Washback effects • Suggestions for developing oral test from information-exchange tasks • Evaluation criteria for oral tests

Four criteria for designing a good test • Carroll (1980) identifies four general criteria in foreign language testing: • Economy • Relevance • Acceptability • Comparability

Economy • By economy Carroll means obtaining the greatest amount of information about the learner’s language in as little time as possible and with a minimum of energy expended. • For a test to be economic, it should merely sample the material covered, not exhaust it. • An instructor can select from among the many items covered and infer or project something about the learner’s overall knowledge or ability

Relevance • Relevance refers to the match between the course and curriculum goals and the tests. • For example, if you teach a course in conversational use of Italian, you would not want to give a formal composition as the final exam. • For a test to be relevant, it should reflect not simply what is taught but, more importantly, how it is taught.

Acceptability • Acceptability is a concept that takes the learners’ point of view into consideration. • It implies learners’ willingness to participate in the testing and their satisfaction that the test evaluates their progress. • For many learners, acceptability is tied to familiarity. If they are not familiar with a testing format or procedure, they may view it as unacceptable.

Comparability • Comparability is a concept that takes the institution’s point of view into consideration. • Test scores for learners who are taught the same material by the same method should be similar. • For example, those enrolled in the 9am section of Portuguese 102 should have test scores similar to the scores of learners enrolled in the 2pm section if the two sections have common goals, materials, syllabi, and methods.

Washback effects Krashen and Terrell (1983) made a statement that addresses the acceptability of a test. “[Testing] can be done in a way that will have a positive effect on the student’s progress. The key to effective testing is the realization that testing has a profound effect on what goes on in the classroom…”

Krashen and Terrell (1983) continued… “…Teachers are motivated to teach and students are motivated to study materials which will be covered on tests. Quite simply, if we want students to acquire a second language, we should give test that promote the use of acquisition activities [in and out of the classroom]. In other words, our tests should motivate students to prepare for the tests by obtaining more comprehensible input and motivate teachers to supply it. (Krashen & Terrell, 1983, p.165)

Washback effect • What and how you test has ramifications for what instructors do in the classroom, what learners expect instructors to do in the classroom, and what learners do outside the classroom. • Testing cannot be viewed as an isolated event; it must be an integral part of the teaching and learning enterprise.

The relevance of a test “Using an approach in the classroom which emphasizes the ability to exchange messages, and at the same time testing only the ability to apply grammar rules correctly, is an invitation to disaster.” (Krashen & Terrell, 1983, p.165)

Oral testing in classrooms: Adapting information-exchange tasks for use as oral tests and quizzes • Lee and VanPatten define “communicative burden as the responsibility of an individual test taker to initiate, respond, manage, and negotiate an oral event.” • The communicative burden of a group discussion is less than the communicative burden of an oral interview. • In a discussion, multiple participants share the communicative burden, each one assuming the responsibilities of initiating, responding, managing, and negotiating the event.

Communicative burden • The communicative burden of a test format becomes an issue when the teacher is considering whether to give an oral quiz or test. • One might decide that an oral quiz at the end of a lesson in the first semester should have a low communicative burden, whereas a quiz at the end of a lesson in the fourth semester should have a greater one. • There are a number of instructional decisions to make regarding oral testing, and these decisions depend on a variety of pedagogical and practical factors.

Washback effect • These decisions may well have a washback effect on instruction. • By knowing and being familiar with the characteristics of the test, instructors may incorporate activities into the classroom that they feel will lead to success on the test. • The type of test can influence both what instructors emphasize and the way in which they emphasize it.

Content of the oral quiz • The content of the oral quiz or test can have another kind of washback effect on instruction. • If the content of the oral test is overtly tied to classroom activities, the learners are provided a stronger motivation for participating in the activities. • Testing and teaching should be interrelated so that learners are responsible for what happens in class.

Demonstration • To demonstrate how Lee and VanPatten interrelate teaching and testing, they convert four of the information-exchange tasks presented in Chapter 3 into test sections. One of these examples is illustrated here. • Recall the following activity from Chapter 3.

Compare your birthday experiences Step 1: Fill in the chart as you interview a classmate. Step 2: Now write a paragraph in which you compare and contrast your birthdays.

Test section on this activity • Phase 1: Warm up. Make the test taker feel comfortable. • Phase 2: Initial questioning. Who was your partner? When is that person’s birthday? When is your birthday? • Phase 3: Activity-related questions. Referring to the chart you made in class, tell me whether you and your partner have celebrated your birthdays in similar or different ways.

Two tests for evaluating spoken language • The first oral proficiency test is the Oral Proficiency Interview (OPI) which was developed by the American Council on the Teaching of Foreign Languages (ACTFL) in conjunction with the Educational Testing Service and several government agencies. • The other test is the Israeli National Oral Proficiency Test developed by Elana Shohamy and her colleagues.

The Oral Proficiency Interview (OPI) • The ACTFL Oral Proficiency Interview has been likened to a face-to-face conversation because an “interviewer” converses with an interviewee. • The goal of the OPI is to obtain a sample of speech that can be rated using the ACTFL Proficiency Guidelines as the measure.

Guidelines • These guidelines comprise level-by-level (from Novice to Superior) descriptions of learner performance: • The content that a learner at a particular level might dominate • Simple greetings, health matters, family, etc. • The functions the learner dominates • Narrating in the past, present, and future • The accuracy present in the learner’s speech

Phases • The procedures used to elicit learner language during the OPI are termed phases. • Omaggio Hadley (1993, pp.456-58) describes each phase as follows:

Phase 1: Warm up • The warm-up portion of the interview is very brief and consists of greeting the interviewee, making him or her feel comfortable, and exchanging the social amenities that are normally used in everyday conversations. • Typically, the warm-up lasts less than three minutes.

Phase 2: Level check • This phase consists of establishing the highest level of proficiency at which the interviewee can sustain speaking performance. • This phase of the interview allows the person being tested to demonstrate his or her strengths. • Designed to elicit a speech sample that is adequate to prove that the person can function accurately at the level hypothesized by the interviewer during the warm-up phase. • Allows the interviewer to get a better idea of the actual proficiency level of the interviewee.

Phase 3: Probes • Probes are questions or tasks designed to elicit a language sample at one level of proficiency higher than the hypothesized lever in order to establish a ceiling on the interviewee’s performance. • The probes may result in linguistic breakdown- the point at which the interviewee ceases to function accurately or cogently because the task is too difficult.

Phase 4: Wind-down • When a ratable sample has been obtained, the tester brings the interviewee back to the lever at which he or she functions most comfortably for the last few minutes of the interview. • This last phase gives the tester one more opportunity to verify that his or her rating is indeed correct.

Single-format test • Each test giver follows the standard, prescribed phases. • OPI training ensures that raters carry out the interview uniformly and apply the ratings consistently. • The OPI is referred to as a “single-format test,” for it consists of only one task (an interview) and there are no other components to the test.

Two concepts • There are two important concepts that emerge from a consideration of testing. • Bias refers to situations in which elicitation and evaluation procedures are not the same for all test takers. The test giver is the variable in this scenario. • Inter-rater reliability refers to the desire to have all raters evaluate a test the same way. Giver a set of criteria, all raters should apply them the same way.

Questions about OPI • Although useful for a variety of reasons, the OPI has been questioned because of its single-format nature. • Shohamy states that: “Viewing oral language as constituting a multiple of different speech styles and functions, (e.g., discussing, arguing, apologizing, interviewing, conversing, being interviewed, reporting, etc.) means that …”

Shohamy continued… “…being interviewed, the speech style and function tapped in an oral interview, represents only a single type or oral interaction. No doubt that it is an important speech style, but clearly, there are also other oral interactions which are equally important in real life situations.” (Shohamy, 1987, p.52)

The Israeli National Oral Proficiency Test • In a series of studies, Shohamy and her colleagues (Reyes, 1982; Shohamy , Reyes, & Bejerano, 1986) found that a learner’s performance on an oral interview was not a valid predictor of that learner’s performance on other oral tasks. • This test was introduced in Israel in 1986 as the national examination for students at the end of twelfth grade.

INOPT continued… • The Israeli National Oral Proficiency Test in English as a Foreign Language (INOPT), in contrast to the OPI, is multicomponential by design and therefore, more comprehensive. • In addition to the oral interview, three other tasks are also used to evaluate test takers’ oral proficiency: role play, a reporting task, and group discussion.

Justification of the four formats • Each format elicits a different speech style, so that the test as a whole comprises a range of speech styles that reflect communicative language use in authentic situations. • Their research demonstrated that the test did discriminate well among various levels of oral proficiency.

Justification continued… • Their statistical analyses on the test allowed them to conclude that each section of the test was indeed different from the other sections. • They concluded that if the goal was to test various speech styles, then each would need to be tested via separate oral tests.

Descriptions • Shohamy and her fellow researchers offer the following descriptions used in the INOPT. • You will undoubtedly notice that their Oral Interview and the ACTFL OPI are quite similar.

Shohamy’s tests • Test 1:Oral interview. The rationale underlying this test was to guide the test-takers into a dialogue with the tester. • Test 2:Role play. The rationale behind this test was to stimulate the test-taker to produce spontaneous speech-behavior within given roles eliciting specific speech functions. In it, the test-taker had to play one role, with the tester playing another, both partners in a dialogue.

Shohamy’s tests continued… • Test 3:Reporting test. The rationale underlying this test was to stimulate the test-taker into a monologue in the foreign language. The student read a newspaper article silently in Hebrew, and was asked to report its general content in English. • Test 4:Group Discussion. The rationale underlying this test was to stimulate the test-takers into a spontaneous discussion of a controversial issue.

Evaluation criteria for tests of spoken language • The speech sample elicited via the OPI is judged against the ACTFL Proficiency Guidelines. • The following level descriptions are taken from Omaggio Hadley (1993, pp.502-504). • Novice: The Novice level is characterized by the ability to communicate minimally with learned material.

Level descriptions continued… • Intermediate: • Create with the language by combining and recombining learned elements, though primarily in a reactive mode • Initiate, minimally sustain, and close in a simple way basic communicative tasks • Ask and answer questions

Level descriptions continued… • Advanced: • Converse in a clearly participatory fashion • Initiate, sustain, and bring to closure a wide variety of communicative tasks • Satisfy the requirements of the school and work situations • Narrate and describe with paragraph-length connected discourse

Level descriptions continued… • Superior: • Participate effectively in most formal and informal conversations on practical, social, professional, and abstract topics • Support opinions and hypothesize using native-like discourse strategies

Judging speech samples • The four speech samples elicited by the INOPT are each judged separately according to the following scale. • 4: Unintelligible • No language produced • No interaction possible • 5: Hardly intelligible • Very poor language produced • Only simplest, fragmentary interaction possible

Judging speech samples continued… • 6: Clearly intelligible • Simple language produced • Interaction possible • Not articulate • 7: Responsive in interaction • Slightly more sophisticated language produced • Consistent error: but do not interfere with fluency • Strong MT [mother tongue] interference (translated patterns, etc.)

Judging speech samples continued… • 8: Almost effortless in expression • Adequate in interaction • Errors: not consistent • 9: Facility of expression • Comfortable initiating in interaction • Sporadic mistakes • 10: No limitation whatsoever • Near-native (Shohamy et al. 1986, p.219)

Similarities between OPI and INOPT • Each contains some kind of interview. • Each uses holistic ratings (that is, a single final “score” for the entire test). • Bachman (1990, p.328) argues that proficiency is not a unitary ability, but rather a componential one because we can identify the pieces and constituent parts of oral proficiency.

Componential rating scales • If oral proficiency is not a unitary ability, then it should not be tested as such (Shohamy et al., 1986), and just as important, it should not be scored as such (Bachman,1990). • Bachman proposes that tests of oral proficiency be evaluated using componential scoring criteria and provides the following criteria used in a test of oral proficiency he developed with a colleague (Bachman & Palmer, 1983.) • The three scales assess grammatical, pragmatic, and sociolinguistic competence.

Sample of scale for grammatical competence

Sample of scale for pragmatic cohesion

Sample of scale for sociolinguistic competence

Summary of chapter 5 • Adapted classroom activities for testing situations • Examined two tests for evaluating spoken language • Suggested the use of tests that examine a variety of speech styles and functions via multiple formats

Chapter 5

Chapter 5

Presentation Transcript

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5 5

chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

CHAPTER 5

Chapter 5

CHAPTER 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5

Chapter 5