Sophie Reissner-Roubicek and Sue Wharton PAD & ELLTA 12 October 2011

EAP for statistics students: Issues in teaching, and findings from a learner corpus Sophie Reissner-Roubicek and Sue Wharton PAD & ELLTA 12 October 2011

ESAP for MORSE students Plan: to provide 10 hours per week of “English support” to the target student group • Who were the target group? • Expectations

Meeting their needs What were their needs? • According to Year 1 Head Tutor: • Pragmatic and sociocultural competence - beyond the boundaries of the classroom • According to Stats HoD, Deputy HoD and CAL: • Academic literacy - EAP and ESAP • Background research: • Journal of Statistics Education, etc. • Self-report • Pairwork survey question design task

Materials development • International Office Student Survey • Article on use of statistics in AL • BAWE corpus • Texts from Journal of Statistics Education • “Personal statements” database

What worked for them • Having time to ask questions about, discuss and practice things that came up • idiom, register, modality, pronunciation etc • Finding themselves in The Zone • Deconstructing the “constructive criticism” section of Stats Lab reports Etc.

The learner corpus Plan: collect texts written by undergraduate statistics students Which text to choose? How to collect them? How to make them into a corpus?

Choosing texts • Discussion with statistics department teachers • Identification of ‘Stats Lab’ assignment where students are asked to: • Provide analyses of data in written form • Discuss the significance of these analyses • Justify their decisions • Constructively criticise their own analyses • Etc.

Collecting the corpus • An information session for students • A university consent form • The help of a research assistant • Co-operation from the Stats department

Sophie Reissner-Roubicek and Sue Wharton, University of Warwick 12 October 2011 Making texts into a corpus • Transcription into .txt documents • Choosing tags • Recording contextual information Resulted in a corpus of 40 texts, 43027 words in total. Shortest text: 523 words. Longest text: 1977 words.

An example from a transcribed text: <STUDENT 1><ASSIGNMENT 1><GROUP 2> <TITLE: DATA 1: Health Expenditure & Life Expectancy> <ANSWER 1A. Diagram. Scatter plot (health expenditure per capita VS. life expectancy) is an appropriate diagram to present data, since we are interested in the correlation between the two tables. <GRAPH> The graph clearly shows a positive correlation, but no strong. First, there is an outlier <FORMULA>, which is unrepresentative. Obviously it is not a reliable data for our further prediction. In order to reduce the error of our prediction which generated by the outlier, we consider to ignore it in our later calculation. Second, more than 2/3 of data are lied between <FORMULA>. The Data is not well distributed symmetrically for x values and hence increasing the error of our prediction. We choose Simpler liner Regression Model to describe this group of data and use the best fit line to predict future values. <FEEDBACK L1: is there any difference if you include the outlier?><FORMULA><GRAPH>. Now we are able to use this equation to estimate the missing life expectancy value for Chile: <FORMULA>>

An example of contextual information Student Number 1 Nationality […] A Levels […] Week 5 Lab leader 1 Assignment 1 Group 3 Mark x/60 49

Sophie Reissner-Roubicek and Sue Wharton, University of Warwick 12 October 2011 Choosing a task to focus on Study an OECD table on total health expenditure per capita and life expectancy at birth for member nations. Plot a graph of life expectancy vs. total health expenditure Discuss what the graph shows Use the graph to estimate a life expectancy for Chile.

Why focus on data description? • Relatively under-researched, but frequently set in science/technology/mathematical subjects • BAWE: • categorises data description assignments under the genre family label ‘Exercise’ • categorisesassignments produced for a Statistics course under the discipline of Mathematics • Of the 34 Mathematics assignments in BAWE, 15 are ‘Exercises’.

An analytical focus: Stance • A writer’s opinion or attitude towards a proposition that their sentence expresses • Allows writer positioning vis a vis not only the information expressed but also the construed readership • Consensus in EAP that it is challenging for NNES

Looking for types of stance • Bottom up coding using Nvivo 8 • Searching for apparent stance types in individual texts • Deciding to focus on certain content categories

Sophie Reissner-Roubicek and Sue Wharton, University of Warwick 12 October 2011

Types of stance: final categories • Bare assertion: 119 instances over 39 texts • Hedged assertion: 81 instances over 32 texts • Vague assertion: 44 instances over 23 texts • Boosted assertion: 15 instances over 14 texts • Assertion with inclusive ‘we’: 14 instances over 12 texts

Repertoire for expressing stance types Bare assertion: frequent language choices is, (57 times including lemma forms) . E.g. ‘there is an outlier’, ‘there are a few outliers’. show, (29 times including lemma forms). E.g. ‘the scatter plot above shows that….’ have (13 times). E.g. ‘it has the weak positive correlation’. No other choices appear more than 5 times.

Hedged assertion: frequent language choices May (11 instances): ‘life expectancy value for Chile may be near 73.3 years’. Could (5 instances). indicateas an alternative to show(6 times). Estimate (16 times) (but it is in the task brief) possible(4 instances) relatively(also 4) maybe ( 1 instance) might(2 instances).

Vague assertion: frequent language choices about, (14 times) between(7), most(7), half(5), more(4) over(4). An apparent preference for overquantifyingrather than underquantifying– most, more and over account for 15 occurrences between them. No examples of few, less, fewer, or under. Below appears once.

Inclusive we – frequent language choices A strong association with the metaphorical use of the verb see – 9 of the instances are some variation on the phrase ‘we can see’.

Pedagogic implications of this • The choice of stance type may well be systematic and based on insider knowledge; how can we exploit this? • The repertoire for each stance type is very narrow – how can we broaden it?

Goals for 2011-2012 include : Generally: • Raising awareness among Stats teaching staff • Includes lecturers, tutors, and “supervisors” • Developing student writing through a consciousness-raising approach • Corpus-based activities designed to promote “noticing” Specifically: • To expand the students’ stance repertoire Moving outwards: • To extend the analysis to the ‘constructive criticism’ section • To look at tutor feedback in the corpus section

Sophie Reissner-Roubicek and Sue Wharton PAD & ELLTA 12 October 2011