Competitive grammar writing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

Competitive Grammar Writing PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

Competitive Grammar Writing. VP. Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon. N = Noun V = Verb P = Preposition D = Determiner R = Adverb. V. R. R. D. N. P. D. N. N. N. The. girl. with. the. newt. pin. hates. peas. quite. violently.

Download Presentation

Competitive Grammar Writing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Competitive grammar writing

Competitive Grammar Writing

VP

Jason EisnerNoah A. Smith

Johns HopkinsCarnegie Mellon


Tree structure

N = Noun

V = Verb

P = Preposition

D = Determiner

R = Adverb

V

R

R

D

N

P

D

N

N

N

The

girl

with

the

newt

pin

hates

peas

quite

violently

Tree structure


Tree structure1

N = Noun

V = Verb

P = Preposition

D = Determiner

R = Adverb

NP = Noun phrase

VP = Verb phrase

PP = Prepositional phrase

S = Sentence

S

NP

PP

NP

VP

NP

N

VP

RP

V

R

R

D

N

P

D

N

N

N

The

girl

with

the

newt

pin

hates

peas

quite

violently

Tree structure


Generative story pcfg

NP

PP

NP

VP

NP

N

VP

RP

V

R

R

D

N

P

D

N

N

N

The

girl

with

the

newt

pin

hates

peas

quite

violently

Generative Story: PCFG

  • Given a set of symbols (phrase types)

  • Start with S at the root

  • Each symbol randomly generates 2 child symbols, or 1 word

  • Our job (maybe): Learn these probabilities

S

p(NP VP | S)


Context freeness of model

NP

NP

VP

NP

N

VP

RP

V

R

R

D

N

D

N

N

N

The

girl

the

newt

pin

Context-Freeness of Model

  • In a PCFG, the string generated under NP doesn’t depend on the context of the NP.

  • All NPs are interchangeable.

S

PP

P

with

hates

peas

quite

violently


Inside vs outside

The

girl

with

the

newt

pin

hates

peas

quite

violently

Inside vs. Outside

  • This NP is good because the “inside” string looks like a NP

S

NP


Inside vs outside1

The

girl

with

the

newt

pin

hates

peas

quite

violently

Inside vs. Outside

  • This NP is good because the “inside” string looks like a NP

  • and because the “outside” context looks like it expects a NP.

  • These work together in global inference, and could help train each other during learning (cf. Cucerzan & Yarowsky 2002).

S

NP


Inside vs outside2

NP

N

D

N

N

The

girl

with

the

newt

pin

hates

peas

quite

violently

Inside vs. Outside

  • This NP is good because the “inside” string looks like a NP

  • and because the “outside” context looks like it expects a NP.

  • These work together in global inference, and could help train each other during learning (cf. Cucerzan & Yarowsky 2002).


Inside vs outside3

S

NP

PP

NP

VP

NP

VP

RP

V

R

R

D

N

P

N

The

girl

with

the

newt

pin

hates

peas

quite

violently

Inside vs. Outside

  • This NP is good because the “inside” string looks like a NP

  • and because the “outside” context looks like it expects a NP.

  • These work together in global inference, and could help train each other during learning (cf. Cucerzan & Yarowsky 2002).


1 welcome to the lab exercise

1. Welcome to the lab exercise!

  • Please form teams of ~3 people …

  • Programmers, get a linguist on your team

    • And vice-versa

  • Undergrads, get a grad student on your team

    • And vice-versa


2 okay team please log in

2. Okay, team, please log in

  • The 3 of you should use adjacent workstations

  • Log in as individuals

  • Your secret team directory:

    cd …/03-turbulent-kiwi

    • You can all edit files there

    • Publicly readable & writeable

    • No one else knows the secret directory name


3 now write a grammar of english

3. Now write a grammar of English

  • You have 2 hours. 


3 now write a grammar of english1

What’s a grammar?

3. Now write a grammar of English

Here’s one to start with.

  • You have 2 hours. 

  • 1S1 NP VP .

  • 1VP VerbT NP

  • 20NP Det N’

  • 1NP Proper

  • 20N’ Noun

  • 1N’ N’ PP

  • 1PP Prep NP


3 now write a grammar of english2

3. Now write a grammar of English

Plus initial terminal rules.

Here’s one to start with.

  • 1Noun castle

  • 1Noun king

  • 1Proper  Arthur

  • 1Proper  Guinevere

  • 1Det a

  • 1Det every

  • 1VerbT covers

  • 1VerbT rides

  • 1Misc that

  • 1Misc bloodier

  • 1Misc does

  • 1S1 NP VP .

  • 1VP VerbT NP

  • 20NP Det N’

  • 1NP Proper

  • 20N’ Noun

  • 1N’ N’ PP

  • 1PP Prep NP


3 now write a grammar of english3

NP

VP

.

3. Now write a grammar of English

Here’s one to start with.

  • 1S1 NP VP .

  • 1VP VerbT NP

  • 20NP Det N’

  • 1NP Proper

  • 20N’ Noun

  • 1N’ N’ PP

  • 1PP Prep NP

S1

1


3 now write a grammar of english4

20/21

Det

N’

1/21

3. Now write a grammar of English

Here’s one to start with.

S1

  • 1S1 NP VP .

  • 1VP VerbT NP

  • 20NP Det N’

  • 1NP Proper

  • 20N’ Noun

  • 1N’ N’ PP

  • 1PP Prep NP

NP

VP

.


3 now write a grammar of english5

Det

N’

drinks [[Arthur [across

the [coconut in the castle]]]

[above another chalice]]

Noun

every

castle

3. Now write a grammar of English

Here’s one to start with.

S1

  • 1S1 NP VP .

  • 1VP VerbT NP

  • 20NP Det N’

  • 1NP Proper

  • 20N’ Noun

  • 1N’ N’ PP

  • 1PP Prep NP

NP

VP

.


4 okay go

How will we be testedon this?

4. Okay – go!


4 okay go1

How will we be testedon this?

5. Evaluation procedure

4. Okay – go!

  • We’ll sample 20 random sentences from your PCFG.

  • Human judges will vote on whether each sentence is grammatical.

    • By the way, y’all will be the judges (double-blind).

  • You probably want to use the sampling script to keep testing your grammar along the way.


5 evaluation procedure

  • 1S1 NP VP .

  • 1VP VerbT NP

  • 20NP Det N’

  • 1NP Proper

  • 20N’ Noun

  • 1N’ N’ PP

  • 1PP Prep NP

5. Evaluation procedure

  • We’ll sample 20 random sentences from your PCFG.

  • Human judges will vote on whether each sentence is grammatical.

  • You’re right: This only tests precision.

  • How about recall?

Ok, we’re done!

All our sentences

are already grammatical.


Development set

covered by initial grammar

Development set

You might want your grammar to generate …

  • Arthur is the king .

  • Arthur rides the horse near the castle .

  • riding to Camelot is hard .

  • do coconuts speak ?

  • what does Arthur ride ?

  • who does Arthur suggest she carry ?

  • why does England have a king ?

  • are they suggesting Arthur ride to Camelot ?

  • five strangers are at the Round Table .

  • Guinevere might have known .

  • Guinevere should be riding with Patsy .

  • it is Sir Lancelot who knows Zoot !

  • either Arthur knows or Patsy does .

  • neither Sir Lancelot nor Guinevere will speak of it .

We provide a file

of 27 sample sentences

illustrating a range of

grammatical phenomena

questions, movement,

(free) relatives, clefts,

agreement, subcat frames, conjunctions, auxiliaries, gerunds, sentential subjects, appositives …


Development set1

Development set

You might want your grammar to generate …

  • the Holy Grail was covered by a yellow fruit .

  • Zoot might have been carried by a swallow .

  • Arthur rode to Camelot and drank from his chalice .

  • they migrate precisely because they know they will grow .

  • do not speak !

  • Arthur will have been riding for eight nights .

  • Arthur , sixty inches , is a tiny king .

  • Arthur knows Patsy , the trusty servant .

  • Arthur and Guinevere migrate frequently .

  • he knows what they are covering with that story .

  • Arthur suggested that the castle be carried .

  • the king drank to the castle that was his home .

  • when the king drinks , Patsy drinks .

questions, movement,

(free) relatives, clefts,

agreement, subcat frames, conjunctions, auxiliaries, gerunds, sentential subjects, appositives …


5 evaluation of recall

No OOVs allowedin the test set.

Fixed vocabulary.

How should we parse sentences with OOV words?

(= productivity!!)

5’. Evaluation of recall

What we could have done:

Cross-entropy on a similar, held-out test set

  • every coconut of his that the swallow dropped sounded like a horse .


5 evaluation of recall1

In Boggle, you get

points for finding

words that your

opponents don’tfind.

Use the fixed vocabulary creatively.

(= productivity!!)

5’. Evaluation of recall

What we could have done:

Cross-entropy on a similar, held-out test set

What we’ll actually do, to heighten competition & creativity:

Test set comes from the participants!

You should try to

generate

sentences that

your opponents

can’t parse.


Competitive grammar writing

Use the fixedvocabulary creatively.

Initial terminal rules

  • 1Nouncastle

  • 1Nounking

  • 1ProperArthur

  • 1ProperGuinevere

  • 1Deta

  • 1Detevery

  • 1VerbTcovers

  • 1VerbTrides

  • 1Miscthat

  • 1Miscbloodier

  • 1Miscdoes

The initial grammar sticks

to 3rd-person singular transitive present-tense forms. All grammatical.

But we provide 183 Misc words (not accessible from initial grammar) that you’re free to work into your grammar …


Competitive grammar writing

Use the fixedvocabulary creatively.

Initial terminal rules

  • 1Miscthat

  • 1Miscbloodier

  • 1Miscdoes

The initial grammar sticks

to 3rd-person singular transitive present-tense forms. All grammatical.

But we provide 183 Misc words (not accessible from initial grammar) that you’re free to work into your grammar …

pronouns (various cases),

plurals,

various verb forms,

non-transitive verbs,

adjectives (various forms),

adverbs & negation,

conjunctions & punctuation,

wh-words,


5 evaluation of recall2

(= productivity!!)

5’. Evaluation of recall

What we could have done (good for your class?):

Cross-entropy on a similar, held-out test set

What we actually did, to heighten competition & creativity:

Test set comes from the participants!

In Boggle, you get

points for finding

words that your

opponents don’tfind.

You should try to

generate

sentences that

your opponents

can’t parse.


5 evaluation of recall3

(= productivity!!)

5’. Evaluation of recall

What we could have done (good for your class?):

Cross-entropy on a similar, held-out test set

What we actually did, to heighten competition & creativity:

Test set comes from the participants!

We’ll score your cross-entropywhen you try to parse the sentences that the other teams generate.

(Only the ones judged grammatical.)

You should try to

generate

sentences that

your opponents

can’t parse.

  • You probably want to use the parsing script to keep testing your grammar along the way.


5 evaluation of recall4

0 probability??

You get the infinite penalty.

(= productivity!!)

5’. Evaluation of recall

What we could have done (you could too):

Cross-entropy on a similar, held-out test set

What we actually did, to heighten competition & creativity:

Test set comes from the participants!

We’ll score your cross-entropywhen you try to parse the sentences that the other teams generate.

(Only the ones judged grammatical.)

What if my grammar can’t parseone of the testsentences?

So don’t do that.


Use a backoff grammar

S2

  • S2  

  • S2  _Noun

  • S2 _Misc

  • _Noun Noun

  • _Noun  Noun _Noun

  • _Noun Noun _Misc

  • _Misc Misc

  • _Misc  Misc _Noun

  • _Misc Misc _Misc

_Verb

(etc.)

Verb

_Misc

rides

Misc

_Punc

‘s

Punc

_Noun

!

Noun

swallow

Use a backoff grammar

: Bigram POS HMM

Initial backoff grammar

i.e., something that starts with a Verb

_Verb

i.e., something that starts with a Misc

Verb

_Misc

. . .

Misc


Use a backoff grammar1

  • S2  

  • S2  _Noun

  • S2 _Misc

  • _Noun Noun

  • _Noun  Noun _Noun

  • _Noun Noun _Misc

  • _Misc Misc

  • _Misc  Misc _Noun

  • _Misc Misc _Misc

(etc.)

Use a backoff grammar

: Bigram POS HMM

Init. linguistic grammar

Initial backoff grammar

  • S1 NP VP .

  • VP VerbT NP

  • NP Det N’

  • NP Proper

  • N’ Noun

  • N’ N’ PP

  • PP Prep NP


Use a backoff grammar2

Initial master grammar

  • START  S1

  • START  S2

  • S2 

  • S2 _Noun

  • S2 _Misc

  • _Noun Noun

  • _Noun  Noun _Noun

  • _Noun Noun _Misc

  • _Misc Misc

  • _Misc  Misc _Noun

  • _Misc Misc _Misc

(etc.)

Use a backoff grammar

: Bigram POS HMM

Mixturemodel

Choose these weights wisely!

Init. linguistic grammar

Initial backoff grammar

  • S1NP VP .

  • VP VerbT NP

  • NP Det N’

  • NP Proper

  • N’ Noun

  • N’ N’ PP

  • PP Prep NP


6 discussion

6. Discussion

  • What did you do? How?

  • Was CFG expressive enough?

    • How would you improve the formalism?

    • Would it work for other languages?

  • How should one pick the weights?

    • And how could you build a better backoff grammar?

    • Is grammaticality well-defined? How is it related to probability?

  • What if you had 36 person-months to do it right?

    • What other tools or data do you need?

    • What would the resulting grammar be good for?

    • What evaluation metrics are most important?

features, gapping


7 winners announced

7. Winners announced


7 winners announced1

Helps to favor

backoff grammar

Anyway, a lot of work!

yay

unreachable

7. Winners announced

  • Of course, no one finishes their ambitious plans.

  • Alternative: Allow 2 weeks (see paper) …


What did past teams do

What did past teams do?

  • More fine-grained parts of speech

  • do-support for questions & negation

  • Movement using gapped categories

  • X-bar categories (following the initial grammar)

  • Singular/plural features

  • Pronoun case

  • Verb forms

  • Verb subcategorization; selectional restrictions (“location”)

  • Comparative vs. superlative adjectives

  • Appositives (must avoid double comma)

  • A bit of experimentation with weights

  • One successful attempt to game scoring system (ok with us!)


Why do we recommend this lesson

Why do we recommend this lesson?

  • Good opening activity

  • Good opening activity

  • Introduces many topics – touchstone for later teaching

    • Grammaticality

      • Grammaticality judgments, formal grammars, parsers

      • Specific linguistic phenomena

      • Desperate need for features, morphology, gap-passing

    • Generative probability models: PCFGs and HMMs

      • Backoff, inside probability, random sampling, …

      • Recovering latent variables: Parse trees and POS taggings

    • Evaluation (sort of)

      • Annotation, precision, recall, cross-entropy, …

      • Manual parameter tuning

    • Why learning would be valuable, alongside expert knowledge

http://www.clsp.jhu.edu/grammar-writing


A final thought

Akin toprogramming

languages

A final thought

  • The CS curriculum starts with programming

    • Accessible and hands-on

    • Necessary to motivate or understand much of CS

  • In CL, the equivalent is grammar writing

    • It was the traditional (pre-statistical) introduction

      • Our contributions: competitive game, statistics, finite-state backoff, reusable instructional materials

    • Much of CL work still centers around grammar formalisms

      • We design expressive formalisms for linguistic data

      • Solve linguistic problems within these formalisms

      • Enrich them with probabilities

      • Process them with algorithms

      • Learn them from data

      • Connect them to other modules in the pipeline


  • Login