Loading in 5 sec....

William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and LinguisticsPowerPoint Presentation

William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics

Download Presentation

William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics

Loading in 2 Seconds...

- 218 Views
- Updated On :
- Presentation posted in: Education / Career

Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001. William Gregory Sakas

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Introduction to Computational Natural Language LearningLinguistics 79400 (Under: Topics in Natural Language Processing)Computer Science 83000 (Under: Topics in Artificial Intelligence)The Graduate School of the City University of New YorkFall 2001

William Gregory Sakas

Hunter College, Department of Computer Science

Graduate Center, PhD Programs in Computer Science and Linguistics

The City University of New York

Feasibility Is acquisition possible within a reasonableamount of time and/or with a reasonable amount of work?

Clark (1994, in press), Niyogi and Berwick (1996), Lightfoot (1989) (degree-0), Sakas(2000), Tesar and Smolensky (1996) and many PAC results concerning induction of FSA’s

Feasibility measure(Sakas and Fodor, 2001)

Near linear increase of the expected number of sentences consumed before a learner converges on the target grammar.

Infeasible Feasible

Feature of learning model to be tested as feasible or not feasible.This could be # of parameters, average sentence length, etc.

A Feasibility Case Study :

A three parameter domain (Gibson and Wexler, 1994)

SV / VS- subject precedes verb / verb precedes subject

VO / OV - verb precedes object / object precedes verb

+V2 / -V2- verb or aux must be in the second position in the sentence

Sentences are strings of the symbols: S, V, 01, 02, aux, adv

Allie will eat the birds S aux V O

Surprisingly, G&W’s simple 3-parameter domain presents nontrivial obstacles to several learning strategies, but the space is ultimately learnable (for example by a "blind-guess-learner" – see next slide).

Big question:

How will the learning process scale up in terms offeasibility as the number of parameters increases?

Two problems for most acquisition strategies:

1) Ambiguity

2) Size of the domain

"Size of the domain" feasibility result:A Blind-guess learner succeeds only after consuminga number of sentences exponentially correlated with the number of parameters.

- Blind-guesslearner (with oracle, no parsing, no error-driven constraint):
- consume a sentence from the linguistic environment.
- randomly select a setting for each parameter == randomly pick a grammar out of the domain.
- if the settings indicate the target grammar stop, otherwise goto 1.

A simple math result shows that the average number of trials required to achieve success* (in this case picking the target grammar) is:

1 / Probability of success.

So if we can figure out the probability of picking the target grammar at step 3, we can model how many inputs the learner will require on average.

* given that each trial has a possible outcome of either success or failure. For those technically minded, this is a hypergeometric distribution.

Probability of Success == Probability of picking the target grammar at step 3, as is calculated as:

Pr(Success) = 1 / Num of grammars

= 1 / 2 num of parameters

Applying the math result for number of trials on average:

1 / (1 / 2 num of parameters) = 2 num of parameters

If # Parameters= 30

then # Grammars = 230= 1,073,741,824

So, on average, over a billion sentences would need to be consumed by the blind-guess learner before attaining the target.

The search space is huge!

Indicates a few ambiguous strings

Cross-language ambiguity

SV VO -V2 (English-like)

S V

S V O

S V O1 O2

S AUX V

S AUX V O

S AUX V O1 O2

ADV S V

ADV S V O

ADV S V O1 O2

ADV S AUX V

ADV S AUX V O

ADV S AUX V O1 O2

SV OV +V2 (German-like)

S V

S V O

O V S

S V O2 O1

O1 V S O2

O2 V S O1

S AUX V

S AUX O V

O AUX S V

S AUX O2 O1 V

O1 AUX S O2 V

O2 AUX S O1 V

ADV V S

ADV V S O

ADV V S O2 O1

ADV AUX S V

ADV AUX S O V

ADV AUX S O2 O1 V

P&P acquisition:

How to obtain informative feasibility results studying linguistically interesting domains with cognitively plausible learning algorithms?

But won't work for large domains.

Create an input space for a linguistically plausible domain.

-- simulations. (Briscoe (2000), Elman (1990, 1991,1996), Yang (200))

So, how to answer questions of feasibility as the number of grammars (exponentially) scales up?

Answer:

introduce some formal notions in order to abstract away from the specific linguistic content of the input data.

- A formal approach
- 1)formalize the learning process and input space
- 2) use the formalization in a probabilistic model to empirically test the learner across a wide range of learning scenarios
- Gives general data on the expected performance of acquisition algorithms. Can answer the question:
- Given learner L, if the input space exhibits characteristics x, y and z, is feasible learning possible?

The Triggering Learning Algorithm -TLA (Gibson and Wexler, 1994)

Searches the (huge) grammar space using local heuristics

repeat until convergence:

receive a string s from L(Gtarg)

if it can be parsed by Gcurr , do nothing

otherwise, pick a grammar that differs byone parameter value from the current grammar

if this grammar parses the sentence,make it the current grammar, otherwise do nothing

Error-driven

SVC

Greediness

TLA-minus (Sakas and Fodor, 2001) simplified TLA used for introducing a way of studying the effects of cross-language ambiguity through some simple formulas

repeat until convergence:

receive a string s from L(Gtarg)

if it can be parsed by Gcurr , do nothing

otherwise, pick a grammar at random (not necessarily one parameter distant)

if this grammar parses the sentence,make it the current grammar, otherwise do nothing

Some variables for TLA- analysis:

n = number of parameters in the domain

r = number of relevant parameters in the domain

a = number parameters expressed ambiguously by an arbitrary input sentence

An example of n and r :

Suppose there are 5 parameters in the domain under study and the learner is being exposed to a language for which either setting of parameter 5 (p5) is "correct."

In this case n = 5 and r = 4 and,there are 2n = 25 = 32 grammars in the domain and,

2n-r= 25-4 = 21 = 2 grammars that the learner can consider a successful convergence.

Some variables for TLA- analysis (con't):

An example of a:

Suppose a sentence, s, is consumed by a learner where a successful parse can be acquired ONLY in the case that p1 and p2 set to 0 and 1, respectively, but the settings for p3 and p4 don't matter (and of course, as we said previously, the setting of p5 doesn't matter for the entire target language).

In this case a =2.That is, 2 parameters are ambiguous - the settings of p3 and p4.

Some formulas for TLA- analysis:

Recall:

# grammars that make up the cluster of legit target grammars

= 2n-r

# grammars that are ambiguous

= 2a

Number of grammars that license an arbitrary sentence s :

= 2n-r * 2a

= 2a+n-r

** Probability that an arbitrary sentence s from the target language can be parsed by an arbitrary grammar: **

= # grammars that can parse / # grammars in the domain

= (2n-r * 2a) / (2n)

= 2a-r

Some formulas for TLA- analysis (con't):

Recall:Number of grammars that license an arbitrary sentence s = 2a+n-r

Probability that one of those grammars is in the 2n-r target clusters:

= # grammars that make up the cluster divided by # grammars that can parse the current input

= 2n-r / (2a+n-r)

= 1 / 2a

A tree of choices and outcomes for the TLA-

Blue is probability

Red is action

Black is outcome

G' = = Gtarget

and will be retained "in the limit." (This is due to the error-driven constraint)

1/2a

Retain Gcurr and get another sentence.

G' is one of the grammars in the target cluster

2a-r

Gcurr can parse

≈ 2a-r

Retain G'and get another sentence.

Sentence

G' can parse

Gcurr can't parse

Pick G'

1-2a-r

G' can't parse

≈ 1-2a-r

Forget about G'. Keep Gcurr (this is Greediness) and get another sentence.