1 / 46

The learner as corpus designer

The learner as corpus designer. Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it. … or the art of fruit salads. Learner uses of corpora. Form-focussed (data-driven learning) Meaning-focussed (learning the culture) Skill-focussed (reading practice)

andrew
Download Presentation

The learner as corpus designer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The learner as corpus designer Guy Aston SSLMIT, University of Bolognaguy@sslmit.unibo.it

  2. … or the art of fruit salads

  3. Learner uses of corpora • Form-focussed (data-driven learning) • Meaning-focussed (learning the culture) • Skill-focussed (reading practice) • Browsing environment (serendipity) • Reference tool for other tasks (reading/writing aid)

  4. Why make your own corpus? • You can devise your own recipe • You know what’s in it • You learn how to do it • Can be fun • Can provide practice in language use

  5. The raw ingredients

  6. Devising your own recipe • Only the text-type(s) you want • Only the texts you want • The quantity you want … small and specialised is beautiful

  7. You know what’s in it • Top-down knowledge of corpus • Top-down knowledge of texts

  8. You learn how to do it • Can be a useful skill for many language workers • technical writers • translators • teachers • Can make you a more critical corpus user

  9. It can be fun • Provides a challenge • Gives sense of achievement/satisfaction Practice in language use • Design/construction/evaluation of corpora can be communicative activities

  10. Why use standard corpora? • Less effort • More reliable • Better packaging • You don’t want to learn to make your own

  11. Less effort

  12. More reliable • if it’s well designed • if it fits your needs

  13. Metatextual information Annotation Corpus-specific software Better packaging

  14. You don’t want to learn to make your own?

  15. A compromise strategy: make your own subcorpus • assemble using the pre-prepared ingredients of a larger corpus or in other words… go to a (fruit) salad bar

  16. (Pick ’n’ mix with the BNC)

  17. You have a choice of • text-types • individual texts • selection by pre-determined criteria • selection by hand • … or both

  18. You know what went in • so top-down processing is easier Little effort • in comparison with making your own

  19. Good packaging • Metatextual information • Linguistic annotation • Can use software designed for full corpus • Indexed

  20. You get to learn • what are(n’t) useful subcorpora • what are(n’t) useful design criteria • how to do it

  21. It can be fun • challenge / achievement / satisfaction You can talk about its • design / construction / evaluation

  22. Talking about fruit salad BNC Sampler: KC2

  23. Talking about fruit salad BNC Sampler: KC2

  24. And now to details … the Sampler awaits!

  25. You can create subcorpora of • specific corpus texts • texts containing solutions to a query • encoded categories of texts • your own categories of texts • and compare them with • other subcorpora • the full corpus

  26. Choosing specific texts Text analysis: selecting

  27. Viewing the index Viewing the index

  28. Party policies(will/shall be + VVN)

  29. Or, to return to our fruit salad text …

  30. A bad language subcorpus: texts containing solutions to a query

  31. Choosing the bad language texts j

  32. collocates of f.*k.* collocates of f_ words

  33. collocates of oh collocates of oh

  34. Making subcorpora using encoded categories • ‘context-governed’ spoken texts • - monologue: 17 texts • - dialogue: 29 texts

  35. Monologue vs Dialogue • More frequent in M* • could • had • he • know • their • were • when • who • your • More frequent in D* • 'll • 'm • any • no • pounds • right • yeah • yes *ranked 20+ positions higher in first 100 words

  36. Investigating the differences • no occurrences of all right in monologue • when you’re / you’ll / you’d / you’ve is more common in monologue than whenwe’re / we’ll / we’d / we’ve;vice-versa in dialogue

  37. you and we you we Monologue 4253 2014 Dialogue 6635 4949

  38. Subcorpora using your own categories David Lee’s book genres • academic non-fiction (13 texts) • non-academic non-fiction (15 texts) • prose fiction (13 texts)

  39. Distinctive -ly adverbs of: • academic non-fiction • accordingly, essentially, eventually, largely, namely, notably, respectively, surprisingly • non-academic non-fiction • effectively, merely, normally, obviously, possibly, specially • prose fiction • carefully, quietly, slightly, slowly, softly, surely, truly

  40. largely (academic non-fict) largely (academic non-fiction)

  41. it (academic non-fiction)

  42. To conclude …

  43. Working with subcorpora can allow • study/comparison of forms/meanings in particular texts/text-types • better-focussed reading practice • more appropriate reference tools for particular tasks • more focussed browsing

  44. Subcorpora • may not be representative (but nor is most language learning data) • are good for forming hypotheses to be tested more widely • will allow more interesting uses when extracted from a larger corpus

  45. Making your own provides • better preparation and motivation for corpus use • more critical awareness • lots to talk about

  46. Enjoy!

More Related