semantic frequency
Download
Skip this Video
Download Presentation
SEMANTIC FREQUENCY:

Loading in 2 Seconds...

play fullscreen
1 / 31

SEMANTIC FREQUENCY: - PowerPoint PPT Presentation


  • 569 Views
  • Uploaded on

SEMANTIC FREQUENCY: A new look at word frequency counts What are the current problems with word frequency counts? (particularly in the field of corpus linguistics) Definition or construct of “word” – Gardner (2007) Form vs. meaning (Read 2000)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SEMANTIC FREQUENCY:' - johana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
semantic frequency

SEMANTIC FREQUENCY:

A new look at word frequency counts

slide2
What are the current problems with word frequency counts?(particularly in the field of corpus linguistics)
  • Definition or construct of “word” – Gardner (2007)
  • Form vs. meaning (Read 2000)
  • Polysemy and Homonomy: high frequency words are the most polysemous (Ravin and Leacock 2000)
  • Idiosyncratic nature of word forms and meanings
  • Computer-generated list often strip words of their meanings and only count forms
implications specifically related to the field of english as a second language
Implications: specifically related to the field of English as a Second Language
  • Word lists – what English is it representing?
  • Coverage – what is the reality of the percentage of representation in the language?
  • Teachability and learnability – what are the psychological realities of teaching and learning vocabulary and meaning?
research questions
RESEARCH QUESTIONS

What difference does form-based frequency counts vs. semantically-based frequency counts have on the following:

1. computer-generated word lists used for pedagogical purposes?

2. estimates of word coverage of representative English texts, both written and spoken?

3. estimates of lexical demands on language learners (the learning burden of a word)?

methodology of the current study

Methodology of the Current Study

Purpose: perform a representative sample semantic frequency count of lemmas to answer the following research questions

methods
Methods

1. Randomly selected 46 lemmas from the BNC with 1500 or more total occurrences (rationale for using lemmas briefly)

2. Found an extensive (though not exhaustive) list of senses for each lemma from WordNet

  • Modified the sense list by conflating some polysemous senses and adding some senses that were encountered while rating semantic frequency in contexts from the BNC (the goal was to try and eliminate polysemy and maintain homonomy)
lemmas vs word families
Lemmas vs. Word Families
  • Lemmas – only include a word and its inflections (usually a part of speech)
  • Ex. work / works / working / worked
  • Word families – include a word, its inflections and transparent (or closely related) derivations

Ex. From RANGE program (Nation)

Active / Actively / Activities/ Activity / Inactive / Inactivity / Activist / Activists / Activism

example senses from wordnet
Example senses from WordNet

FAIR – adjective

  • 1. S: (adj) fair, just (free from favoritism or self-interest or bias or deception; conforming with established standards or rules) "a fair referee"; "fair deal"; "on a fair footing"; "a fair fight"; "by fair means or foul"
  • 2. S: (adj) fair, fairish, reasonable (not excessive or extreme) "a fairish income"; "reasonable prices"
  • 3. S: (adj) bonny, bonnie, comely, fair, sightly (very pleasing to the eye) "my bonny lass"; "there's a bonny bay beyond"; "a comely face"; "young fair maidens"
  • 4. S: (adj) fair ((of a baseball) hit between the foul lines) "he hit a fair ball over the third base bag"
  • 5. S: (adj) average, fair, mediocre, middling (lacking exceptional quality or ability) "a novel of average merit"; "only a fair performance of the sonata"; "in fair health"; "the caliber of the students has gone from mediocre to above average"; "the performance was middling at best"
  • 6. S: (adj) fair (attractively feminine) "the fair sex"
  • 7. S: (adj) clean, fair ((of a manuscript) having few alterations or corrections) "fair copy"; "a clean manuscript"
  • 8. S: (adj) honest, fair (gained or earned without cheating or stealing) "an honest wage"; "an fair penny"
  • 9. S: (adj) fair (free of clouds or rain) "today will be fair and warm"
  • 10. S: (adj) fair, fairish ((used of hair or skin) pale or light-colored) "a fair complexion";
example of conflated senses
Example of conflated senses

FAIR – adjective

  • S: (adj) fair, just (free from favoritism or self-interest or bias or deception; conforming with established standards or rules) "a fair referee"; "fair deal"; "on a fair footing"; "a fair fight"; "by fair means or foul" + S: (adj) honest, fair (gained or earned without cheating or stealing) "an honest wage"; "an fair penny"- related to definition #4 in the sense of conformity to rule; ‘fair enough’ = just, alright, acceptable, good, fine; fair play (within the rules)
  • S: (adj) fair, fairish, reasonable (not excessive or extreme) "a fairish income"; "reasonable prices”; can mean more towards a lot or toward a large amount but not to the complete extreme or excessiveness
  • S: (adj) bonny, bonnie, comely, fair, sightly (very pleasing to the eye) "my bonny lass"; "there's a bonny bay beyond"; "a comely face"; "young fair maidens" + S: (adj) fair (attractively feminine) "the fair sex"-related to definition 10 (CS8) depending on cultural norms
  • S: (adj) fair ((of a baseball) hit between the foul lines) "he hit a fair ball over the third base bag"–somewhat related to 1 in conformity of rule
  • S: (adj) average, fair, mediocre, middling (lacking exceptional quality or ability) "a novel of average merit"; "only a fair performance of the sonata"; "in fair health"; "the caliber of the students has gone from mediocre to above average"; "the performance was middling at best"– somewhat related to definition 2 in being in the middle rather than at the extremes
  • S: (adj) clean, fair ((of a manuscript) having few alterations or corrections) "fair copy"; "a clean manuscript"
  • S: (adj) fair (free of clouds or rain) "today will be fair and warm"

8. S: (adj) fair, fairish ((used of hair or skin) pale or light-colored) "a fair complexion";

literal vs figurative
Literal vs. Figurative

“the metaphorical use of lion (e.g., John is a lion) is likely to be treated as ‘the same word,’ while the concrete and metaphorical uses of crane (‘kind of bird’ and ‘machine for lifting heavy objects’) are more likely to be treated as independent words and therefore members of different lemmas. If it is difficult to group words meanings under headwords at the abstract level of the dictionary, it is much more difficult to assign words in texts unambiguously to their lemmas.” (Knowles and Mohd Don 2004: 70)

methods cont d
Methods (cont’d)

4. Analyzed the contexts and rated (or assigned a sense to) 100 spoken and 100 written contexts (200 total) for each lemma. **Each lemma was double rated.

  • Triple ratings were done when discrepancies between ratings occurred.
  • Senses for each lemma were tallied up and percentages figured in Excel.
results
RESULTS

QUESTION #1

What difference does form-based frequency counts vs. semantically-based frequency counts have on the following:

1. computer-generated word lists used for pedagogical purposes?

*How much homonomy exists in the lemmas?

break down of sense distributions
Break Down of Sense Distributions

LEMMA + POS s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16

sense distributions cont d
Sense Distributions (cont’d)

LEMMA + POS s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16

effect on word lists
Effect on Word Lists?

Lemma Form-based Frequency Semantically-based Frequency

conclusions
Conclusions
  • The existence of homonomy in these 23 words creates 58 total senses - more than a doubling the number of potential high frequency lemmas
  • The lemmas of high frequency word lists would be altered and expanded in their order of lemmas, contents, and size in order to include all significant lemmas
  • Apparent idiosyncratic nature of vocabulary - 23 of the words had little or no homonomy indicating that some high frequency lemmas have meanings that are very predictable, constant, and therefore likely to be learned more easily
results20
RESULTS

Question #2

What difference does form-based frequency counts vs. semantically-based frequency counts have on the following:

2. estimates of word coverage of representative English texts, both written and spoken?

*How much of the language are each lemma ACTUALLY covering?

word coverage and comprehension thresholds
Word Coverage and Comprehension Thresholds

95% comprehension of words in order to comprehend a text or discourse

98% comprehension for pleasure reading

(Nation 2006)

conclusions27
Conclusions
  • The existence of homonomy creates more lemmas which:

a) decreases the amount of coverage that each lemma represents in the language

b) increases the number of lemmas that must be learned in order to reach the 95% comprehension threshold

  • Differences between frequency of lemmas in spoken and written languages creates a need for separate high-frequency word lists for these two major registers
results28
RESULTS

Question #3

What difference does form-based frequency counts vs. semantically-based frequency counts have on the following:

3. estimates of lexical demands on language learners (the learning burden of a word)?

*How many words do teachers need to teach and learners need to learn in order to comprehend?

learning and teaching burdens
Learning and Teaching Burdens
  • Increase in the number of words (or lemmas) to know in order to communicate and comprehend texts
  • Awareness of large amounts of homonomy in high frequency words will increase the learning burden of some high frequency words for ESL learners (probable confusion: misuse or miscomprehension; contexts must be usable to make distinctions between senses)
  • A distinction in learning and teaching spoken vs. written language and vocabulary increases learning and teaching burdens
implications for esl teachers and learners
Implications for ESL Teachers and Learners
  • Pay attention to high frequency word counts and word lists - how they are created, how they are being used, how a word is defined, etc.
  • Pay attention to how to define a “word”
  • Understand that coverage thresholds have been underestimated
  • Be careful about using concordancing and other computer generated information for teaching and learning word meanings
implications cont d
Implications (cont’d)
  • Make sure to be aware of the psychological realities of vocabulary items
    • Can the learners make connections between words with similar roots, similar meanings, words used metaphorically and symbolically, etc.
    • Can the learners make connections between context and meaning to recognize and understand distinctions?
ad