slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The New General Service List: Celebrating 60 years of Vocabulary Learning PowerPoint Presentation
Download Presentation
The New General Service List: Celebrating 60 years of Vocabulary Learning

Loading in 2 Seconds...

play fullscreen
1 / 61

The New General Service List: Celebrating 60 years of Vocabulary Learning - PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on

Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo browne@ltr.meijigakuin.ac.jp. The New General Service List: Celebrating 60 years of Vocabulary Learning. A few current Corpus Projects…. Business English Word List for NHK TV Show in Japan

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The New General Service List: Celebrating 60 years of Vocabulary Learning' - feryal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Dr. Charles Browne

Professor of Applied Linguistics

Meiji Gakuin University, Tokyo

browne@ltr.meijigakuin.ac.jp

The New General Service List: Celebrating 60 years of Vocabulary Learning

a few current corpus projects
A few current Corpus Projects…
  • Business English Word List for NHK TV Show in Japan
  • EnglishCentral (a HUGE video corpus of authentic English)
  • New General Service List (CEC)
  • New Academic Word List (CEC)
  • TOEIC Vocabulary Study List (using past tests materials)
slide4

EFL Vocabulary Learning in Japan…

exasperate

digress

abstain

emigrate

torment

chaos

chaos

permission

permission

and

of

the

and

of

the

Frequency

600,000

  • The Negative Effect of “Test English”
  • PROBLEM: Students NEED to learn the first 5000 words of English to use English in the real word…
  • But entrance exams and high school textbooks force students to memorize hundreds of low-frequency words…
  • RESULT? High school students can’t deal with real world English because they don’t know hundreds of the most important high frequency words…

84,168

42,024

25,537

23,371

14,641

5,000

4,441

ace

2,566

bid

HFW

2,289

sum

3

2

1

when reading or listening to a text students will of course will not know many words
When reading or listening to a text, students will of course will not know many words…

What percentage of words do you think must be known for them to be able to read easily?

50% ?

75% ?

85% ?

95% ?

75 coverage 1000 high frequency words
75% Coverage 1000 high frequency words

[ 19 missing words ]

…another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very _____ way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ _____ of doing _____ _____. There is a _____ _____ in the _____ classroom of using games with a _____ purpose to increase and _____ learner _____ (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

85 coverage 2 000 high frequency words
85% Coverage 2000 high frequency words

[ 13 missing words ]

…another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ method of doing _____ _____. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner _____ (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

95 coverage 5 000 high frequency words
95% Coverage 5000 high frequency words

[ 4 missing words ]

…another possible problem with vocabulary _____ is how to sustain learner motivation although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the sole method of doing vocabulary review. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner motivation (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner affective filter (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

vocabulary thresholds
Vocabulary Thresholds:
  • Below 80%, reading comprehension is almost impossible (Hu & Nation, 2001)
  • 95% coverage is the point at which learners can read without the help of dictionaries (Laufer, 1989)
goals of the ngsl project
Goals of the NGSL Project…
  • to update and greatly expand the size of the corpus used (273 million words) compared to the limited corpus behind the original GSL (about 2.5 million words), with the hope of increasing the generalizability and validity of the list
  • to create a NGSL of the most important high-frequency words useful for second language learners of English which gives the highest possible coverage of English texts with the fewest words possible.
  • to make a NGSL that is based on a clearer definition of what constitutes a word
  • to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original Interim version of the GSL)
original gsl in a nutshell
Original GSL in a nutshell…
  • West’s 1953 GSL was actually a more fully developed version of Faucett’s 1936 “Interim Report on Vocabulary Selection” (sponsored by the Carnegie Corporation)
  • Contributors included many famous linguists such as Thorndike, Horn, Maki, Palmer and West
  • Based on a 2.5 million word hand collected corpus (later increased to 5 million words)
  • Combined objective (frequency) and subjective (teacher intuition) criteria
  • Approximately 2200 words giving about 80% coverage in general texts
  • No systematic attempt to define what a word was:

“no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness” (West, 1953, page viii)

general service lists gsl west 1953 http jbauman com aboutgsl html 1953
General Service Lists GSL (West, 1953)http://jbauman.com/aboutgsl.html#1953
academic word list awl coxhead 2000 http www victoria ac nz lals resources academicwordlist
Academic Word List AWL (Coxhead 2000)http://www.victoria.ac.nz/lals/resources/academicwordlist/
getting awl gsl lists w definitions sound files
Getting AWL/GSL lists w/definitions & sound files…
  • I made a few GSL/AWL apps and have made all the context available for free to teachers and researchers. Please contact me if you need any of the following for the GSL or AWL:
  • Word lists
  • Parts of speech
  • Definitions in easy English
  • Definitions in Japanese
  • Sound files for pronunciation of words
  • browne@ltr.meijigakuin.ac.jp
original gsl created in 1930s 2 5m corpus may have had too many agriculture and religion texts
Original GSL created in 1930s…2.5m corpus may have had too many agriculture and religion texts?

AGRICULTURE

  • plow
  • mill
  • spade
  • cultivator

SEA TRAVEL

  • sailor
  • oar
  • vessel
  • merchant

RELIGION

  • kingdom
  • god
  • devil
  • mercy
  • bless
  • fellowship
  • preach
  • sacred
  • worship
  • holy
  • pray
  • heaven
  • grace
  • pupil
  • church 
  • Lord

NOT AS IN USE?

  • telegraph
  • chimney
  • coal
  • cottage
  • gaiety
  • shilling
  • headdress
  • saucer
  • woolen
  • amongst
starting point for ngsl access to cambridge s more modern 2 billion word corpus
Starting Point for NGSL….Access to Cambridge’s more modern 2 BILLION word corpus

CEC corpora used for preliminary analysis of NGSL

Corpus Tokens

Newspaper 748,391,436

Academic 260,904,352

Learner 38,219,480

Fiction 37,792,168

Journals 37,478,577

Magazines 37,329,846

Non-Fiction 35,443,408

Radio 28,882,717

Spoken 27,934,806

Documents 19,017,236

TV 11,515,296

Total 1,282,909,322

problems
Problems…
  • Newspaper subsection was too large and dominated the frequencies
  • Newspaper subsection in CEC had too much of a bias towards financial terms
  • Academic subcorpus of CEC not really related to needs of General English for 2nd language learners
balancing the ngsl corpus
Balancing the NGSL Corpus…

CEC corpora included in final analysis for NGSL

Corpus Tokens

Learner 38,219,480

Fiction 37,792,168

Journals 37,478,577

Magazines 37,329,846

Non-Fiction 35,443,408

Radio 28,882,717

Spoken 27,934,806

Documents 19,017,236

TV 11,515,296

Total 273,613,534*

*273 million word subsection used is 100x larger than original GSL corpus…

next steps
Next steps…
  • Removed proper nouns
  • Removed numbers, days of the week, months of the year, etc.
  • Used statistical procedures to combine the frequencies from the various sub-corpora while adjusting for differences in their relative sizes
  • Had meetings with Paul Nation to review list in relation to other frequency list and add/delete words deemed appropriate
comparing the gsl and ngsl
Comparing the GSL and NGSL:

“To be or not to be, that is the question.”

  • 10 Tokens

to, to, be, be, or, not, that, is, the, question

  • 8 Types

to, be, or, not, that, is, the, question

  • 7 Lemmas

to, be, or, not, that, the, question

comparing the gsl and ngsl1
Comparing the GSL and NGSL:

“To be or not to be, that is the question.”

RankWordTokensCoverage

1 be 3 30%

2 to 2 20%

3 not 1 10%

3 or 1 10%

3 question 1 10%

3 that 1 10%

3 the 1 10%

comparing the gsl and ngsl2
Comparing the GSL and NGSL:

The assumption in Word Families is that if the headword is known, so are all derived forms…

ACCEPT

ACCEPTABILITY

ACCEPTABLE

UNACCEPTABLE

ACCEPTANCE

ACCEPTED

ACCEPTING

ACCEPTS

comparing the gsl and ngsl4
Comparing the GSL and NGSL:

THE WORD FAMILY APPROACH (Bauer and Nation, 1993)

Level 1

A different form is a different word. Capitalization is ignored.

Level 2

Regularly inflected words are part of the same family.

Level 3 (10 affixes)

-able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses

Level 4 (10 affixes)

-al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, in-, all with restricted uses.

comparing the gsl and ngsl5
Comparing the GSL and NGSL:

Level 5 (48 affixes)

-age (leakage), -al (arrival), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom: officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence, -ent(absorbent), -ery (bakery: trickery), -­ese (Japanese; officialese), -esque (picturesque, -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (ducking), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), anti- (anti-inflation), ante- (anteroom), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean).

comparing the gsl and ngsl6
Comparing the GSL and NGSL:

Level 6 (10 affixes)

-able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y

Level 7

Classical roots

comparing the gsl and ngsl7
Comparing the GSL and NGSL:
  • However, the GSL is not consistent in defining what to count as a word.
      • “no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness” (West, 1953, page viii)
      • To get some consistency, Bauman and Culligan (1995) grouped the original GSL headwords using Level 4 affixes. Then they ranked the words according to frequencies from the Brown Corpus.
      • Subsequently, Nation released a word list with the program Range that grouped words up to Level 6 affixes, and also included numbers, days of the week, months of the year, and metric units of measurement.
comparing the gsl and ngsl8
Comparing the GSL and NGSL:

NGSL: A Modified Lexeme Approach

  • All inflected forms for all parts of speech plus the plural of the gerund
  • Includes both British & American spellings
  • Examples
    • accept: accepts, accepted, accepting, acceptings
    • acceptable:acceptables
    • paint: paints, painted, painting, paintings
comparing the gsl and ngsl apples and oranges no longer
Comparing the GSL and NGSL: Apples and Oranges no longer…

When both lists are lemmatized, the NGSL provides far more coverage with far fewer words, one of the chief goals of this project…

list downloadable in many forms www newgeneralservicelist org
List downloadable in many forms www.newgeneralservicelist.org

Headword list…

list downloadable in many forms www newgeneralservicelist org1
List downloadable in many forms www.newgeneralservicelist.org

Lemmatized list…

list downloadable in many forms www newgeneralservicelist org2
List downloadable in many forms www.newgeneralservicelist.org

List with definitions in easy English…

list downloadable in many forms www newgeneralservicelist org3
List downloadable in many forms www.newgeneralservicelist.org

List with raw data… (coming soon!)

new cambridge text series using ngsl both in text and online
New Cambridge Text Series Using NGSL(both in text and online)

Screen Shot 2013-10-09 at 3.34.00 PM

slide61
Dr. Charles Browne

Professor of Applied Linguistics

Meiji Gakuin University, Tokyo

browne@ltr.meijigakuin.ac.jp

much more to come…

Thank you!

The New General Service List: Celebrating 60 years of Vocabulary Learning