An IPO Task Difficulty Matrix for Prototypical Tasks for Task-based Assessment

An IPO Task Difficulty Matrix for Prototypical Tasks for Task-based Assessment Sheila Shaoqian Luo School of Foreign Languages Beijing Normal university Sept 22 2007

The presentation structure… • introduction • literature • the rationale of the research • research questions and research methods • studies: the evolution of the IPO TD matrix • findings • issues and suggestion for future research • implications

I Introduction: The ChineseNational English Curriculum (CNEC, 2001) Characteristics: • Multidimensional curriculum + Humanistic approach • Focus on ability to use the language • Nine levels + Competence-based: Can-do-statements • Promoting Task-Based Language Teaching (TBLT) • Lists of themes, functions, grammar and vocabulary

The CNEC Goals Affect and attitudes Learning strategies Cultural awareness Integrated ability for use Language skills Linguistic knowledge

II Literature: Language competence models • Canale and Swain’s Model: linguistic competence; sociolinguistic competence; discourse competence (Canale, 1983); strategic competence • Bachman’s communicative competence model: (1) Organization: - Grammar; Text; (2) Pragmatics: - illocution; sociolinguistics • Skehan’s TBLA model: (1) to inform task selection (to predict the relative difficulty of each task); (2) to ensure the full range of candidates’ ability will be tapped); (3) to assist test developers in structuring test tasks and the conditions under which these tasks are performed in appropriate ways; (4) to inform development of rating scale descriptors; (5) to facilitate interpretation of test scores (which may differ according to tasks)

Task based Performance and Language Testing: The Skehan Model V (2006) Rater } Scale/Criteria Score Performance Interlocutors Context of testing Task characteristics Task Task conditions Ability for Use Underlying Competences Consequences

Language competence models: Assessment • Canale and Swain’s Model framework play compensatory roles • Bachman’s model: strategic competence plays the central role, which orchestrates knowledge, language, context, assessment, planning, and execution; emphasizes on the search for an underlying “structure-of-abilities” • Skehan’s model: Task is the center in generalizing learners’ ability for use; goes beyond the role of strategic competence and draw into play “generalized processing capacities and the need to engage worthwhile language use (Skehan, 1998a, p. 171).

Issues in language testing • What we give test takers to do • Unless tasks have known properties, we will not know if performance is the result of the candidate or the particular task • Without a knowledge of how tasks vary, we cannot know how broadly we have sampled someone’s language ability, cf. narrowness and therefore ungeneralisabilty • How we decide about candidate ability • Obviously underlying competences are important • We also need to probe how people activate these competences, through ability for use • Knowledge of this area will enable us to make more effective context-to-context generalisations and avoid the narrowness of context-bound performance testing (Skehan, Dec. 2006)

If tasks are a relevant unit for testing, the research problem is to try to systematically “develop more refined measures of task difficulty” . (Skehan, 1998:80) The Problem of Difficulty • Traditional approaches • Give a series of test items • Calculate the pass proportion • Rank the items in difficulty (classical, IRT) • Blue Skies Solutions • Effects of different tasks on performance areas • Do construct validation research • Use a range of tasks when testing (Skehan, Dec. 2006)

A more realistic solution: The present research • Use an analytic scheme to make estimates of task difficulty • Explore whether this analytic scheme can generate agreement between different raters • Explore whether this analytic scheme has a meaningful relationship to (a) performance ratings, and (b) discourse analysis measures

III Research Rationale: Defining the problem • Identification of valid, user-friendly sequencing criteria for tasks and test tasks is a pressing but old problem • Grading task difficulty and sequencing tasks both appear to be arbitrary processes not based on empirical evidence (Long & Crookes, 1992) • The Norris-Brown et al. matrix (1998; 2002; influenced by Skehan (1996) offers one way of characterising test task difficulty, but lacks obvious connection to a Chinese secondary context.

Weaknesses in previous findings on task difficulty: were of only moderate support for the proposed relationships between the combinations of cognitive factors with particular task types… (Elder et al., 2002) This research… • investigates the development and use of a prototype task difficulty scheme based on current frameworks for assessing task characteristics and difficulty, e.g. Skehan (1998), Norris et al. (1998), and Brown et al (2002). Hypothesis: • There is a systematic relationship between task difficulty and hypothesized task complexity (see also Elder , 2002)

IV Research questions How can language ability in TBLT in mainland Chinese middle schools best be assessed? • Is the Brown et al. task difficulty framework appropriate to the mainland Chinese school context? If it is not, then what is an alternative framework? • Is it possible to have a task difficulty framework that can be generalized from context to context? • What are the teachers’ perceptions of task difficulty in a Chinese context? • What are the factors that are considered to affect task difficulty in this context?

Underlying abilities: (1) competence-oriented underlying abilities; (2) a structure made up of different interactive and inter-related components (Canale & Swain, 1980; Bachman, 1990); (3) different performances drawing upon these underlying abilities (Bachman, 1990); (4) sampling such underlying abilities in the comprehensive and systematic manner so to provide the basis for generalizing to non-testing situations. • Predicting performance: the way abilities are actually used through tasks (factors may affect performance) • Generalizing from context to context: to characterize features of context in order to identify what in common different contexts are and how knowledge of performance in one area could be the basis for predicting a learner’s performance in another area. • A processing approach: to establish a sampling frame for the range of performance conditions which operate so that generalizations can be made, in a principled way, to a range of processing conditions. (Table (1).doc)

Research Design and Methodology • A hybrid method of quantitative analysis and qualitative analysis in both deductive and inductive ways: matrix studies matrix deductive inductive (1) A correlational analysis to explore the relationship between tasks and task difficulty components; and • a qualitative analysis of verbal self- reports and focus group interviews on the factors that affect task difficulty Two research phases: (1) Phase one: Study One~Study Four (March~May 2004) Application of the Norris-Brown et al. task difficulty matrix (2) Phase two: Study Five~Study Ten (Oct 2004~2005) Establishing and evolution of the IPO task difficulty matrix

Summary of research participants

Summary of research instruments

Research studies1. Phase one: Applying Norris et al.’s task difficulty matrixStudy One~Four (March~May 2004) • Applying modified Norris et al. (1998)’s task difficulty matrix (Tables1) to 28 professional and experienced English teachers to investigate its transferability in mainland China • Results: • Impossible to rate task difficulty with pluses and minuses • Among fourteen tasks, + agree - three tasks: Planning the weekend, Shopping in supermarket, Radio weather information. (common general topics in the daily life.) • Tremendous disagreement between the Chinese teachers’ ratings and Norris et al.’s predicted difficulty level (Table2).

Task difficulty matrix for prototypical tasks: ALP (Norris et al., 1998, p. 84)

ModifiedTask difficulty matrix

Phase one: Conclusions • The Norris et al (1998) and Brown et al. (2002) matrix unable to be reliably employed • There was a discrepancy on the difficulty levels of tasks between Norris et al. and the Chinese teachers • Agreement with general topics yet much disagreement among more cognitive demanding tasks • Norris et al. tasks might not be appropriate and there might need an alternative framework for predicting task difficulty

Phase Two: Establishing IPO task difficulty matrix (Studies Five~Ten; 2004~2005) 1 The IPO-CFS task difficulty scheme 2 CNEC-theme related tasks (Table3) • 24 CNEC (2001) themes Personal information; Family, friends and people around; Personal environments; Daily routines; School life; Interests and hobbies; Emotions; Interpersonal relationships; Plans and intentions; Festivals, holidays and celebrations; Shopping; Food and drink; Health and fitness; Weather; Entertainment and sports; Travel and transport; Language learning; Nature; The world and the environment; Popular science and modern technology; Topical issues; History and geography; Society; Literature and art

Input Processing Output Content Support (making oral/written expression more accurate and fluent)

Findings • Study 5: Correlation for the means of both teachers : .65 • However, the 2 sets of tasks generated variations in difficulty within one theme  Leading to further research into task characteristics and requirements, and task analysis (Table4) • Study 6 and 7: 24 CNEC tasks (Table5) that vary in difficulty • IPO x extended CFMS (Table6) • 2 self-reporters + Rater comments: detailed verbal self-report data to examine mental processes during rating of the tasks and help refining the matrix. • Findings: (1) Encouraging correlations: all but one range from .52 to .83. The exceptional pair of .34leads to further data collection from both raters and students for the matrix reliability and validity. (2) The matrix is improving, but needs input from actual raters; Inseparable Input, Processing, Output

Refining the IPO task difficulty matrixStudies Eight~Ten • Raters: Professionals (10 + 5 + 9) • CNEC-theme related tasks(15 + 9) • IPO x Information, Language, Performance conditions, Support (ILPS) (Table8) • Inter-rater correlations: (1) Study Eight correlation range: .69 to .92 (2) Study Nine correlation range: .62 to .91 (3) Study Ten correlation range: .75 to .87.

Fifteen prototypical tasks

IPO task difficulty matrix for task-based assessmentTable9.doc Dimensions: INPUT PROCESSING OUTPUT Component: I Information: • Amount • Type: Familiar-unfamiliar; Personal-impersonal; Concrete-abstract; Unauthentic-authentic • Organization: Structured-unstructured • Operations: Retrieval vs. transformation; Reasoning II Language: Level of syntax; Level of vocabulary III Performance Conditions:Modality; Time pressure IV Support

Structured -Unstructured • Input information or task has a clear and tight organizational structure, e.g. clear narrative with beginning, middle, end. All or most elements of task are clearly connected to one another. • Input information or task has organizational structure, but this is fairly loose, so that some connections need to be made by the test-taker. • Input information or task is partly organized, with some sections which are structured and organized, but with other areas which need more active integration by the test-taker. • Information or task requires test-taker to bring organization to material which isn’t organized. Test taker has to make the links which are necessary for the task to be done, or to organize the material which is involved..

A comparison between Brown et al.’s matrix and the IPO task difficulty matrix • Similarities (5): Primary research question; Similar purposes; similar design of matrix; an example of an assessment alternative; Sources • Differences (10): Test Objects; Task Themes; Task Focus; +(-)related to curriculum; Task Selection; Definitions/Labels; Characteristics; Layout; Rating System; Raters

Focus group interview summary

VI Implications: IPO-ILPS task difficulty matrix • Tasks and Task-based Assessment (1) Estimating task difficulty: to use learner performances on sampled tasks to predict future performances on tasks that are constituted by related difficulty components. (Norris et al., 1998:58) (2) Students with greater levels of underlying ability will be able to successfully complete tasks which come higher on such a scale of difficulty. (Skehan, 1998:184) • Language Teaching and Learning • may be useful for syllabus designers to develop and sequence pedagogic tasks in order of increasing task difficulty: to promote language proficiency and facilitate L2 development, the acquisition of new L2 knowledge, and restructuring of existing L2 representations” (Robinson, 2001, p. 34). • may help language teachers and testers when they make decisions regarding classroom teaching and design, and regarding the task-based assessments appropriate for the testing inferences they must make in their own education settings.

VII Limitations • language assessment does not necessarily need to “subscribe to a single model of language test development and use”: teachers and students may be interested more “in specific aspects of performance more appropriately conceived of as task- or text-related competence” (Brown et al., 2002, p. 116). • the matrix and procedures developed and investigated here is that they were from a cognitive perspective and many other factors are not explored from other perspectives. • the nature of the target language tasks that serve as the basis of the assessment instruments and procedures: task appropriateness in particular learning contexts + locally defined assessment needs.

VIII Issues and suggestions for future research • the IPO task difficulty matrix for TBA: -- to promote the generalizability: more research needed in different regions in EFL contexts • Tasks: Both carefully sampled spoken and written tasks + calibrated test items for reading and listening. • the social practice (McNamara & Roever, 2006) of the task difficulty matrix • More qualitative dimension on judging the difficulty level of a task would bring the main outcome a qualitative profile, mainly features of the tasks. • Role of strategies in determining the difficulty levels • To what extent does the IPO task difficulty matrix provide a basis for the assessment of various language activities and competences?

IX Conclusions • Tasks are an interesting basis for exploring language teaching (Skehan, 2006a) and language testing. • “We need to find more and find out how to make tasks work more effectively. We don’t know yet how this can be done, but we will never know if we don’t do research” (Skehan, 2006a). • Hopefully, Norris and Brown et al.’s (1998; 2002) studies and the studies attributed in this thesis have provided useful information and instruments that will profitably contribute to this research area of task-based teaching and learning, and assessment.

Acknowledgments This is a presentation based on my Ph.D. research under the supervision of Professor Peter Skehan. My great gratitude goes to my supervisor, Professor Skehan. I also thank my committee members, Professor Jane Jackson and Professor David Coniam at the Chinese University of Hong Kong, who have contributed thoughtful and helpful suggestions to this study. My thanks go to the the participants in the research.

Selected References Bachman, L. F. (2002). Some reflections on task-based language performance assessment. Language Testing, 19(4), 453-476. Brown, J. D., Hudson, T., Norris, J. & Bonk, W. J. (2002). An investigation of second language task-based performance assessments. Second Language Teaching & Curriculum Center, University of Hawai’i at Manoa. Coniam, D., & Falvey, P. (1999). Assessor training in a high-stakes test of speaking: the Hong Kong English language benchmarking initiative. Melbourne Papers in Language Testing, 8 (2), 1–19. del Pilar Garcia Mayo, M. (Ed.). (2007). Investigating tasks in formal language learning. Clevedon: Multilingual Matters. den Branden, K. V. (Ed.) (2006), Task-based language education: From theory to practice (pp. 1-16). Cambridge: Cambridge University Press. Elder C., Iwashita N., & McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing, 19,4, 343-368. Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual feedback to enhance rater training: Does it work? Language Assessment Quarterly, 2(3), 175-196. Ellis, R. (2003). Task-based language learning and teaching. Oxford: Oxford University Press. Ellis, R., & Barkhuizen, G. (2005). Analyzing learner language. Oxford: Oxford University Press. Iwashita N., Elder C., & McNamara T. (2001). Can we predict task difficulty in an oral proficiency test? Exploring the potential of an information-processing approach to task design. Language Learning, 51(3), 401-436. Knoch, U., Read, J., & Von Randow, J. (2006, June). Re-training writing raters online: How does it compare with face-to-face training?Paper presented at the 28th Annual Language Testing Research Colloquium of the International Language Testing Association, University of Melbourne, Australia (June 29, 2006).

Ministry of Education, China (2001). A pilot paper: The national English curriculum standards. Beijing: Beijing Normal University Press. Norris, J. M., Brown, J. D., Hudson, T. D., & Bonk, W. (2002). Examinee abilities and task difficulty in task-based second language performance assessment. Language Testing, 19(4), 395-418. Nunan, D. (1993). Task-based syllabus design: Selecting, grading, and sequencing tasks. In G.. V. Crookes & S. M. Gass (Eds.), Tasks in a pedagogical context: Integrating theory and practice (pp. 55-68). Clevedon, Avon: Multilingual Matters. Nunan, D. (2004). Task-Based Language Teaching. Cambridge: Cambridge University Press. Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied Linguistics, 22 (1), 27 – 57. Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied Linguistics, 17 (1), 38-62. Skehan, P. (1998). A Cognitive approach to language learning. Oxford: Oxford University Press. Skehan, P. (1999). The influence of task structure and processing conditions on narrative retellings. Language Learning, 49(1), 93–120. Skehan, P. (2001). Tasks and language performance assessment. In M. Bygate, P. Skehan & M. Swain (Eds.), Researching pedagogic tasks: Second language learning teaching and testing (pp. 167-185). London: Longman. Skehan, P. (2003). Task-based instruction. Language Teaching, 36(1), 1-14. Skehan, P., & Foster, P. (1997). Task type and task processing conditions as influences on foreign language performance. Language Teaching Research, 1(3), 185–211. Wolfe-Quintero, K., Inagaki, S., & Kim, H. (1998). Second language development in writing: Measures of fluency, accuracy & complexity. Second Language Teaching & Curriculum Center, Honolulu: University of Hawai‘i Press.

The Great Wall starts from where we stand: A long way to go…

An IPO Task Difficulty Matrix for Prototypical Tasks for Task-based Assessment