1 / 31

SEMANTIC FREQUENCY:

SEMANTIC FREQUENCY: A new look at word frequency counts What are the current problems with word frequency counts? (particularly in the field of corpus linguistics) Definition or construct of “word” – Gardner (2007) Form vs. meaning (Read 2000)

johana
Download Presentation

SEMANTIC FREQUENCY:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEMANTIC FREQUENCY: A new look at word frequency counts

  2. What are the current problems with word frequency counts?(particularly in the field of corpus linguistics) • Definition or construct of “word” – Gardner (2007) • Form vs. meaning (Read 2000) • Polysemy and Homonomy: high frequency words are the most polysemous (Ravin and Leacock 2000) • Idiosyncratic nature of word forms and meanings • Computer-generated list often strip words of their meanings and only count forms

  3. Implications: specifically related to the field of English as a Second Language • Word lists – what English is it representing? • Coverage – what is the reality of the percentage of representation in the language? • Teachability and learnability – what are the psychological realities of teaching and learning vocabulary and meaning?

  4. RESEARCH QUESTIONS What difference does form-based frequency counts vs. semantically-based frequency counts have on the following: 1. computer-generated word lists used for pedagogical purposes? 2. estimates of word coverage of representative English texts, both written and spoken? 3. estimates of lexical demands on language learners (the learning burden of a word)?

  5. Methodology of the Current Study Purpose: perform a representative sample semantic frequency count of lemmas to answer the following research questions

  6. Methods 1. Randomly selected 46 lemmas from the BNC with 1500 or more total occurrences (rationale for using lemmas briefly) 2. Found an extensive (though not exhaustive) list of senses for each lemma from WordNet • Modified the sense list by conflating some polysemous senses and adding some senses that were encountered while rating semantic frequency in contexts from the BNC (the goal was to try and eliminate polysemy and maintain homonomy)

  7. Lemmas vs. Word Families • Lemmas – only include a word and its inflections (usually a part of speech) • Ex. work / works / working / worked • Word families – include a word, its inflections and transparent (or closely related) derivations Ex. From RANGE program (Nation) Active / Actively / Activities/ Activity / Inactive / Inactivity / Activist / Activists / Activism

  8. 46 Randomly Selected Lemmas

  9. Example senses from WordNet FAIR – adjective • 1. S: (adj) fair, just (free from favoritism or self-interest or bias or deception; conforming with established standards or rules) "a fair referee"; "fair deal"; "on a fair footing"; "a fair fight"; "by fair means or foul" • 2. S: (adj) fair, fairish, reasonable (not excessive or extreme) "a fairish income"; "reasonable prices" • 3. S: (adj) bonny, bonnie, comely, fair, sightly (very pleasing to the eye) "my bonny lass"; "there's a bonny bay beyond"; "a comely face"; "young fair maidens" • 4. S: (adj) fair ((of a baseball) hit between the foul lines) "he hit a fair ball over the third base bag" • 5. S: (adj) average, fair, mediocre, middling (lacking exceptional quality or ability) "a novel of average merit"; "only a fair performance of the sonata"; "in fair health"; "the caliber of the students has gone from mediocre to above average"; "the performance was middling at best" • 6. S: (adj) fair (attractively feminine) "the fair sex" • 7. S: (adj) clean, fair ((of a manuscript) having few alterations or corrections) "fair copy"; "a clean manuscript" • 8. S: (adj) honest, fair (gained or earned without cheating or stealing) "an honest wage"; "an fair penny" • 9. S: (adj) fair (free of clouds or rain) "today will be fair and warm" • 10. S: (adj) fair, fairish ((used of hair or skin) pale or light-colored) "a fair complexion";

  10. Semantic Relatedness Scale Nagy and Anderson (1984)(used for the conflation process)

  11. Example of conflated senses FAIR – adjective • S: (adj) fair, just (free from favoritism or self-interest or bias or deception; conforming with established standards or rules) "a fair referee"; "fair deal"; "on a fair footing"; "a fair fight"; "by fair means or foul" + S: (adj) honest, fair (gained or earned without cheating or stealing) "an honest wage"; "an fair penny"- related to definition #4 in the sense of conformity to rule; ‘fair enough’ = just, alright, acceptable, good, fine; fair play (within the rules) • S: (adj) fair, fairish, reasonable (not excessive or extreme) "a fairish income"; "reasonable prices”; can mean more towards a lot or toward a large amount but not to the complete extreme or excessiveness • S: (adj) bonny, bonnie, comely, fair, sightly (very pleasing to the eye) "my bonny lass"; "there's a bonny bay beyond"; "a comely face"; "young fair maidens" + S: (adj) fair (attractively feminine) "the fair sex"-related to definition 10 (CS8) depending on cultural norms • S: (adj) fair ((of a baseball) hit between the foul lines) "he hit a fair ball over the third base bag"–somewhat related to 1 in conformity of rule • S: (adj) average, fair, mediocre, middling (lacking exceptional quality or ability) "a novel of average merit"; "only a fair performance of the sonata"; "in fair health"; "the caliber of the students has gone from mediocre to above average"; "the performance was middling at best"– somewhat related to definition 2 in being in the middle rather than at the extremes • S: (adj) clean, fair ((of a manuscript) having few alterations or corrections) "fair copy"; "a clean manuscript" • S: (adj) fair (free of clouds or rain) "today will be fair and warm" 8. S: (adj) fair, fairish ((used of hair or skin) pale or light-colored) "a fair complexion";

  12. Literal vs. Figurative “the metaphorical use of lion (e.g., John is a lion) is likely to be treated as ‘the same word,’ while the concrete and metaphorical uses of crane (‘kind of bird’ and ‘machine for lifting heavy objects’) are more likely to be treated as independent words and therefore members of different lemmas. If it is difficult to group words meanings under headwords at the abstract level of the dictionary, it is much more difficult to assign words in texts unambiguously to their lemmas.” (Knowles and Mohd Don 2004: 70)

  13. Methods (cont’d) 4. Analyzed the contexts and rated (or assigned a sense to) 100 spoken and 100 written contexts (200 total) for each lemma. **Each lemma was double rated. • Triple ratings were done when discrepancies between ratings occurred. • Senses for each lemma were tallied up and percentages figured in Excel.

  14. Example Context Ratings

  15. RESULTS QUESTION #1 What difference does form-based frequency counts vs. semantically-based frequency counts have on the following: 1. computer-generated word lists used for pedagogical purposes? *How much homonomy exists in the lemmas?

  16. Break Down of Sense Distributions LEMMA + POS s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16

  17. Sense Distributions (cont’d) LEMMA + POS s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16

  18. Effect on Word Lists? Lemma Form-based Frequency Semantically-based Frequency

  19. Conclusions • The existence of homonomy in these 23 words creates 58 total senses - more than a doubling the number of potential high frequency lemmas • The lemmas of high frequency word lists would be altered and expanded in their order of lemmas, contents, and size in order to include all significant lemmas • Apparent idiosyncratic nature of vocabulary - 23 of the words had little or no homonomy indicating that some high frequency lemmas have meanings that are very predictable, constant, and therefore likely to be learned more easily

  20. RESULTS Question #2 What difference does form-based frequency counts vs. semantically-based frequency counts have on the following: 2. estimates of word coverage of representative English texts, both written and spoken? *How much of the language are each lemma ACTUALLY covering?

  21. Word Coverage and Comprehension Thresholds 95% comprehension of words in order to comprehend a text or discourse 98% comprehension for pleasure reading (Nation 2006)

  22. 5 most homonymous lemmas

  23. Coverage

  24. Extrapolations

  25. Written vs. Spoken

  26. Writing vs. Speaking

  27. Conclusions • The existence of homonomy creates more lemmas which: a) decreases the amount of coverage that each lemma represents in the language b) increases the number of lemmas that must be learned in order to reach the 95% comprehension threshold • Differences between frequency of lemmas in spoken and written languages creates a need for separate high-frequency word lists for these two major registers

  28. RESULTS Question #3 What difference does form-based frequency counts vs. semantically-based frequency counts have on the following: 3. estimates of lexical demands on language learners (the learning burden of a word)? *How many words do teachers need to teach and learners need to learn in order to comprehend?

  29. Learning and Teaching Burdens • Increase in the number of words (or lemmas) to know in order to communicate and comprehend texts • Awareness of large amounts of homonomy in high frequency words will increase the learning burden of some high frequency words for ESL learners (probable confusion: misuse or miscomprehension; contexts must be usable to make distinctions between senses) • A distinction in learning and teaching spoken vs. written language and vocabulary increases learning and teaching burdens

  30. Implications for ESL Teachers and Learners • Pay attention to high frequency word counts and word lists - how they are created, how they are being used, how a word is defined, etc. • Pay attention to how to define a “word” • Understand that coverage thresholds have been underestimated • Be careful about using concordancing and other computer generated information for teaching and learning word meanings

  31. Implications (cont’d) • Make sure to be aware of the psychological realities of vocabulary items • Can the learners make connections between words with similar roots, similar meanings, words used metaphorically and symbolically, etc. • Can the learners make connections between context and meaning to recognize and understand distinctions?

More Related