Processing Corpus-derived Multi-unit Sequences by L2 English Learners

Processing Corpus-derived Multi-unit Sequences by L2 English Learners Fei Fei Second Language Studies Program Michigan State University May 19, Beijing

Purpose of the present study Formulaic language use has long been one of the research foci in the study of second language acquisition. For L2 learners of intermediate and advanced proficiency, formulaic language was the biggest stumbling block to sounding nativelike (Wray, 2002). However, most studies on formulaic sequences focus on textual and descriptive aspects. Few studies investigate multi-unit sequence processing among L2 English learners. Even fewer explore individual factors in multi-unit sequence processing.

What is a formulaic sequence? • Formulaic sequences: stored and retrieved holistically from memory at the time of use. • Issues are • Compositionality (e.g., Howarth, 1998; Wray, 2002) • Representation and production (e.g., Sinclair, 1991; N. Ellis, 1996) • Development in L2 (e.g., Wong-Fillmore, 1976)

Why corpus-derived multi-unit sequences? Word strings or lexical bundles generated based on frequency may not be stored holistically in mind, and also formulaic sequences stored as a whole may not be identified through certain corpus analysis. However, Wray (2002, p. 25) suggested that “frequency as a salient, perhaps even a determining, factor in the identification of formulaic sequences.” Numerous studies of formulaic sequences are based on corpus frequency (e.g., Sinclair & Renouf, 1988; DeCock, Granger, Leech & McEnery, 1998; Moon, 1998; Hunston & Francis, 2000). The more often a word string is needed, the more likely it is to be stored in prefabricated form to save processing effort. Once it is stored, the more likely it is to be the preferred choice at the time of use.

What is a corpus-derived MUS? In short, corpus-derived multi-word sequence(s) • are based on corpus frequency; • may not be psycholinguistically valid; • is either fully fixed in form, or semi-preconstructed phrases. • is a subset of formulaic sequences

Target multi-unit sequences • Schmitt et al.’s (2004) : Longman Grammar of Spoken and Written English Lexical Phrases and Language Teaching Hyland’s list BNC (British National Corpus) CANCODE (Cambridge and Nottingham Corpus of Discourse in English) MICASE (Michigan Corpus of Academic Spoken English) • Biber (2004): the T2K-SWAL Corpus (TOEFL 2000 Spoken and Written Academic Language Corpus) • ANC: American National Corpus http://americannationalcorpus.org/frequency.html

Schmitt et al’s (2004) studyon processing MUS: textual attributes • Frequency • Length • Transparency in terms of meaning and function

Individual variables in processing multi-unit sequences: proficiency • Research (Hinger & Spottl, 2002; Spottl & McCarthy, 2003, 2004; Schmitt et al., 2004) indicated that vocabulary size and language proficiency were two factors in investigating cross-linguistic lexical operations. • Spottl & McCarthy (2004), in their cross-linguistic study of formulaic sequences, argued that without a certain level of general language proficiency, noticing did not even take place, and word strings were completely ignored, or simply avoided by learners. L2 language proficiency was defined as scores on a proficiency test (upper intermediate and advanced level). • In Schmitt et al.’s (2004) study, the highest level non-native speakers in the study demonstrated native-like performance mostly.

Individual variables in processing MUS: working memory • Working memory can be divided into two main components: one is phonological short-term memory (STM), and the other is storage and processing capacity, referred to as the Central Executive (CE). • Previous studies showed that WM can affect: • L2 syntactic processing and development (e.g.,Ellis & Sinclair, 1996; Ellis, 2001; Juffs, 2004); • L2 lexical processing and development (e.g., French, 2003; Papagno & Vallar, 1995); • L2 proficiency and aptitude (e.g., Kroll, Michael, Tokowicz, & Dufour, 2002; Payne & Whitney, 2002; Service & Kohonen, 1995).

Individual variables in processing MUS: working memory • Myles et al. (1999) found that STM capacity can predict the ability to chunk. “Chunking”, in their study, was defined as the ability to remember set phrases in L2 and later use them appropriately. • Roberts and Gibson (2001) found high correlations between sentence memory and complex span; sentence memory and N-back span. It was argued that memory for sentences was not simply a result of linguistic experience; rather, it was likely that an independent working memory component contributes to participants’ performance on sentence memory.

In sum, The present study seeks to test the role of proficiency and WM in L2 English learners’ processing of high frequency multi-unit sequences. Influences of textual attributes of MUS are also addresses. The study may contribute to explaining the variances in L2 English learners’ formulaic language use.

Research questions • What is the relationship between proficiency, WM and participants’ processing of MUS? • Do textual attributes of MUS affect how L2 learners process them? • What are the linguistic features of learners’ reproduction of MUS?

Participants • Thirty-two adult L2 English learners participated in the present study. • They were graduate students recruited from a wide range of disciplines from a big Mid-western university in the States. The reported TOEFL scores ranged from 570 to 650. • Participants' ages ranged from 21 to 38, with 10-14 years of formal English learning experience. • All participants were native speakers of Chinese and had been living in the United States for less than 2 years.

Measuring the variables • Elicited Imitation (EI) test is used as a measure of learner’s knowledge of precise grammatical factors (e.g. Hamayan et al., 1977; Gallimore and Tharp, 1981; Munnich et al., 1994), L2 competence (Baddeley et al., 1998; N. Ellis, 2001), and implicit knowledge (Erlam, 2006). • The utterance elicited is argued to reflect the degree to which a test taker is able to assimilate the stimulus into an internal grammar (Munnich et al., 1994). • “The basic idea is that if the stretches of language are long enough, it overloads working memory, and the person is forced to reconstruct the content of the dictation via their language resources, rather than repeating the dictation back from rote memory. One of those language resources is the inventory of formulaic sequences stored in memory.” (Schmitt et al., 2004)

Measuring the variables • The Elicited Imitation (EI) test is available at http://distancelearning.llc.msu.edu/research/chunks/ with assigned ID and password • Two factors in designing an EI test: sentence length (Bley-Vroman and Chauron, 1994) and time pressure (R. Ellis, 2005) • There were two tasks in the EI test Task 1 was a passage revised based on Schmitt's study (2004), which contained 25 target multi-word sequences. Task 2 included 18 target multi-word sequences derived from the American National Corpus and the T2K-SWAL Corpus. They were embedded into 18 single sentences. • Scoring: complete reproduction = 2 points attempted reproduction with missing lexis = 1 point missing reproduction = 0 point

Measuring the variables • The Working Memory test included • a reverse digit span task (15 items) • a word span task (15 items) • Both span tasks were classical WM tasks. They were adapted and written by two researchers. The length of WM test items varies from 5 to 8 for both reverse digit span and word span.

Summary of the variables • Dependent variable: processing of MWS as indicated by participants’ mean scores on the Elicited Imitation (EI) test • Independent variables Individual factors • Language proficiency (TOEFL scores within 2 years) • Working memory Textual attributes • Frequency • Length • Transparency in terms of meaning and function

Quantitative results Intercorrelations Among Proficiency, WM and Dictation scores Note.** Correlation significant at the 0.01 level (2-tailed). * Correlation significant at the 0.01 level (2-tailed).

Quantitative results Results of Multiple Regression Analysis Scores on EI test = -136.664 + 0.942 *word span + 0.191*proficiency

Quantitative results Means, SD and t-tests of Textual Factors: Transparency, Length, Frequency Note. * p <0.05 level ** p <0.01 level

Qualitative results • Close examination of the transcribed data showed the following: • (a) Complementizers in the clauses were not produced in general (e.g. “that” in multi-word sequences such as “make sure that” and “I understand that;”) • (b) Participants reconstructed multi-word sequences in a creative way (e.g. “in a variety of” was produced as “in varieties of,” “have varieties of,” and have various (colors);” ) • (c) There were many cases where semantically similar sequences were produced (e.g. “from the point of view” was replaced by phrases such as “as to,” “for,” “in terms of;”) • (d) There were L1 interferences in reproduction (e.g. Three participants used “day and night” rather than “night and day.”) • It is assumed that the participants may have retrieved more frequent or salient MUS within the same lexical framework (morph-syntax interface).

Discussion The primary purpose of the present study is to examine the impact of textual and individual factors on L2 English learners’ processing of corpus-derived MUS.

Discussion: WM and proficiency • The finding that general proficiency played a role in processing MUS was consistent with previous studies (Spottl et al., 2002; Schmitt et al., 2002). • However, when WM was taken into consideration, the results were mixed. Evidence indicated that different memory tasks functioned differently in the processing of MUS. Specifically, there was no significant relationship between the reverse digit span and the performance scores. • Significant correlation was found between the word span and the performance scores. This finding was consistent with Roberts and Gibson’s (2003) view that STM as measured by simple word span may be a better indicator of individual differences in online processing.

Discussion: WM and proficiency • The findings were also supported by Myles et al. (1999) who concluded that high-word-span learners can accumulate more chunks than low-span learners. The more chunks a learner has, the more comparisons he/she can carry out to establish cross-chunk analyses. The more frequent chunk-internal analyses have been made, the easier it is to process chunks online. • However, the results needed to be treated with caution. This study investigated only a small number of MUS (43 in total).

Discussion: Textual attributes • Significant differences were only found when MUS were categorized based on the degree of transparency in terms of meaning and function. However, there were no significant differences in terms of processing when MUS were categorized based on frequency or length. • A plausible interpretation was that the results had to do with contextual information, that is, sentences, with the target sequences embedded, might mitigate the differences in terms of frequency or length to a certain extent.

Conclusion • Implications: the relationship MUS and language proficiency • Robinson (2002) stressed that “WM is only one of a complex set of cognitive factors that come together to account for learners’ performance.” In this study, two individual factors (proficiency and WM as measured by word span) account for 46.7% of the variance of the scores on the EI test. Future studies might include other variables in order to achieve a better understanding of MUS processing. • So, which variables to choose? Do we need a model?

Next steps • Pausing as a significant indicator (R. Ellis) • Using Chinese EFL learner’s corpus • A sample of 50 participants and a NS control group • Data from stimulated recall for qualitative analysis • Issue of scoring EI test (Prof. Hansen) • The issue of using an EI test for FS will be addressed in a follow-up study.

T A H N K 谢谢 Y O U

Processing Corpus-derived Multi-unit Sequences by L2 English Learners

Processing Corpus-derived Multi-unit Sequences by L2 English Learners

Presentation Transcript

English Language Learners

teaching writing to L2 learners

English Language Learners

ENGLISH LANGUAGE LEARNERS

CLL lecture: L2 processing

Corpus-informed exercises for learners of English: the TestBuilder program

English Learners

ENGLISH LANGUAGE LEARNERS

English Language Learners

Corpus Processing Ch1

Korean L2 learners' perceptions of the language cue: Konglish vs. English

English Language Learners

English Learners

English Language Learners

English Learners

English Learners (EL)

Developing Materials for L2 Science Learners

The Acquisition of the English Article System by Turkish L2 Learners

Results L2 English targets

USE OF MODAL AUXILIARIES BY MACEDONIAN LEARNERS OF ENGLISH: A CORPUS BASED STUDY

ENGLISH LANGUAGE LEARNERS *

Elementary English Learners