1 / 17

GDEX: Automatically finding good dictionary examples in a corpus

GDEX: Automatically finding good dictionary examples in a corpus. Users appreciate examples. Paper: space constraints Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing. Project. Macmillan English dictionary Already had 1000 collocation boxes

gagan
Download Presentation

GDEX: Automatically finding good dictionary examples in a corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kilgarriff: GDEX GDEX: Automatically finding good dictionary examples in a corpus

  2. Kilgarriff: GDEX Users appreciate examples Paper: space constraints Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing

  3. Kilgarriff: GDEX Project Macmillan English dictionary Already had 1000 collocation boxes Average 8 per box New electronic version All 8000 collocations need examples Authentic; from corpus

  4. Kilgarriff: GDEX Old method Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit

  5. Kilgarriff: GDEX New method Lexicographer Gets sorted concordance 20 best examples in spreadsheet Less reading through Tick the first good one, edit

  6. Kilgarriff: GDEX What makes a good example? Readable EFL users Informative Typical, for the collocation Gives context which helps user understand the target word/phrase

  7. Kilgarriff: GDEX Readability 70 years research Not just (or mainly) EFL Educational theory Teaching children to read Instruction manuals Early work: US military Publishing People like newspapers and magazines that they find easy to read

  8. Kilgarriff: GDEX Readability tests Fleish Reading Ease test 1948 Ave sentence length, ave word length In some word processing software Many similar measures Recent work training data for different reading levels Language modelling Target levels US grades Now, increasingly: Common European Framwork

  9. Kilgarriff: GDEX GDEX Get concordance for collocation For each sentence Score it Sort Show best ones to lexicographer

  10. Kilgarriff: GDEX GDEX heuristics Sentence length (10-26 words)‏ Mostly common words is good Rare words are bad Sentences Start with capital, end with one of .!? No [, ], <, >, http, \ Not much other punctuation, numbers Not too many capitals Typicality: third collocate is a plus

  11. Kilgarriff: GDEX Weighting For each sentence Score on each heuristic Weight scores Add together weighted score How to set weights? Two students: Manually judged 1000 “good examples” Weights set so system makes same choices as students

  12. Kilgarriff: GDEX Was it successful? Did it save lexicographer time? Definitely (says project manager)‏ Rough guess Average number of corpus lines to read until you find a good one: Unsorted: 20 Sorted: 5

  13. Kilgarriff: GDEX Corpus choice Started with BNC but Too old Not enough examples If no good examples in corpus, GDEX can’t help Changed to UKWaC 20 times bigger; from web; contemporary Better Most web junk filtered out Usually a good example in top twenty

  14. Kilgarriff: GDEX GDEX and TALC TALC (Teaching and Language Corpora)‏ Goal: bring corpora into lg teaching Usual problem Concordances are tough for learners to read Way forward GDEX examples Half way between dictionary and corpus

  15. Kilgarriff: GDEX GDEX: Models for use More examples for dictionaries Speed up, as with MED or Fully automatic “more examples” Corpus query tool Sort concordances, best first Now an option in the Sketch Engine Automatic collocations dictionary http://forbetterenglish.com

  16. Kilgarriff: GDEX Recent developments • Configurable GDEX • For other languages • Interface to help set up • Commonest string • Between ‘bare collocate’ and example

  17. Kilgarriff: GDEX

More Related