Day 2 pruning continued begin competition models
1 / 23

Day 2: Pruning continued; begin competition models - PowerPoint PPT Presentation

  • Uploaded on

Day 2: Pruning continued; begin competition models. Roger Levy University of Edinburgh & University of California – San Diego. Today. Concept from probability theory: marginalization Complete Jurafsky 1996: modeling online data Begin competition models. Marginalization.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Day 2: Pruning continued; begin competition models' - nedra

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Day 2 pruning continued begin competition models

Day 2: Pruning continued;begin competition models

Roger Levy

University of Edinburgh


University of California – San Diego


  • Concept from probability theory: marginalization

  • Complete Jurafsky 1996: modeling online data

  • Begin competition models


  • In many cases, a joint p.d. will be more “basic” than the raw distribution of any member variable

  • Imagine two dice with a weak spring attached

  • No independence → joint more basic

  • The resulting distribution over Y is known as the marginal distribution

  • Calculating P(Y) is called marginalizing over X


  • Concept from probability theory: marginalization

  • Complete Jurafsky 1996: modeling online data

  • Begin competition models

Modeling online parsing
Modeling online parsing

  • Does this sentence make sense?

    The complex houses married and single students and their families.

  • How about this one?

    The warehouse fires a dozen employees each year.

  • And this one?

    The warehouse fires destroyed all the buildings.

  • fires can be either a noun or a verb. So can houses:

    [NP The complex] [VP houses married and single students…].

  • These are garden path sentences

  • Originally taken as some of the strongest evidence for serial processing by the human parser

Frazier and Rayner 1987

Limited parallel parsing
Limited parallel parsing

  • Full-serial: keep only one incremental interpretation

  • Full-parallel: keep all incremental interpretations

  • Limited parallel: keep some but not all interpretations

  • In a limited parallel model, garden-path effects can arise from the discarding of a needed interpretation

[S [NP The complex] [VP houses…] …]


[S [NP The complex houses …] …]


Modeling online parsing garden paths
Modeling online parsing: garden paths

  • Pruning strategy for limited ranked-parallel processing

    • Each incremental analysis is ranked

    • Analyses falling below a threshold are discarded

    • In this framework, a model must characterize

      • The incremental analyses

      • The threshold for pruning

  • Jurafsky 1996: partial context-free parses as analyses

  • Probability ratio as pruning threshold

    • Ratio defined as P(I) : P(Ibest)

  • (Gibson 1991: complexity ratio for pruning threshold)

Garden path models 1 n v ambiguity
Garden path models 1: N/V ambiguity

  • Each analysis is a partial PCFG tree

  • Tree prefix probability used for ranking of analysis

  • Partial rule probs marginalize over rule completions

these nodes are actually

still undergoing expansion

*implications for granularity of structural analysis

N v ambiguity 2
N/V ambiguity (2)

  • Partial CF tree analysis of the complex houses…

  • Analysis of houses as noun has much lower probability than analysis as verb (> 250:1)

  • Hypothesis: the low-ranking alternative is discarded

N v ambiguity 3
N/V ambiguity (3)

  • Note that top-down vs. bottom-up questions are immediately implicated, in theory

  • Jurafsky includes the cost of generating the initial NP under the S

    • of course, it’s a small cost as P(S -> NP …) = 0.92

  • If parsing were bottom-up, that cost would not have been explicitly calculated yet

Garden path models ii

(that was)

Garden path models II

  • The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs)

  • From the valence + simple-constituency perspective, MC and RRC analyses differ in two places:

The horse raced past the barn fell.



best intransitive:


transitive valence: p=0.08

Garden path models ii 2
Garden path models II (2)

  • 82 : 1 probability ratio means that lower-probability analysis is discarded

  • In contrast, some RRCs do not induce garden paths:

  • Here, found is preferentially transitive (0.62)

  • As a result, the probability ratio is much closer (≈ 4 : 1)

  • Conclusion within pruning theory: beam threshold is between 4 : 1 and 82 : 1

  • (granularity issue: when exactly does probability cost of valence get paid??? c.f. the complex houses)

The bird found in the room died.

*note also that Jurafsky does not treat found as having POS ambiguity

Notes on the probabilistic model
Notes on the probabilistic model

  • Jurafsky 1996 is a product-of-experts (PoE) model

    • Expert 1: the constituency model

    • Expert 2: the valence model

  • PoEs are flexible and easy to define, but…

    • The Jurafsky 1996 model is actually deficient (loses probability mass), due to relative frequency estimation

Notes on the probabilistic model 2

sometimes approximated as

Notes on the probabilistic model (2)

  • Jurafsky 1996 predated most work on lexicalized parsers (Collins 1999, Charniak 1997)

  • In a generative lexicalized parser, valence and constituency are often combined through decomposition & Markov assumptions, e.g.,

  • The use of decomposition makes it easy to learn non-deficient models

Jurafsky 1996 pruning main points
Jurafsky 1996 & pruning: main points

  • Syntactic comprehension is probabilistic

  • Offline preferences explained by syntactic + valence probabilities

  • Online garden-path results explained by same model, when beam search/pruning is assumed

General issues
General issues

  • What is the granularity of incremental analysis?

    • In [NPthe complex houses], complex could be an adjective (=the houses are complex)

    • complex could also be a noun (=the houses of the complex)

    • Should these be distinguished, or combined?

    • When does valence probability cost get paid?

  • What is the criterion for abandoning an analysis?

  • Should the number of maintained analyses affect processing difficulty as well?


  • Concept from probability theory: marginalization

  • Complete Jurafsky 1996: modeling online data

  • Begin competition models

General idea
General idea

  • Disambiguation: when different syntactic alternatives are available for a given partial input, each alternative receives support from multiple probabilistic information sources

  • Competition: the different alternatives compete with each other until one wins, and the duration of competition determines processing difficulty

Origins of competition models
Origins of competition models

  • Parallel competition models of syntactic processing have their roots in lexical access research

  • Initial question: process of word recognition

    • are all meanings of a word simultaneously accessed?

    • or are only some (or one) meanings accessed?

  • Parallel vs. serial question, for lexical access

Origins of competition models 2
Origins of competition models (2)

  • Testing access models: priming studies show that subordinate (= less frequent) meanings are accessed as well as dominant (=more frequent) meanings

  • Also, lexical decision studies show that more frequent meanings are accessed more quickly

Origins of competition models 3
Origins of competition models (3)

  • Lexical ambiguity in reading: does the amount of time spent on a word reflect its degree of ambiguity?

  • Readers spend more time reading equibiased ambiguous words than non-equibiased ambiguous words (eye-tracking studies)

  • Different meanings compete with each other

Of course the pitcher was often forgotten…



Rayner and Duffy (1986); Duffy, Morris, and Rayner (1988)

Competition in syntactic processing
Competition in syntactic processing

  • Can this idea of competition be applied to online syntactic comprehension?

  • If so, then multiple interpretations of a partial input should compete with one another and slow down reading

    • does this mean increase difficulty of comprehension?

    • [compare with other types of difficulty, e.g., memory overload]

Constraint types
Constraint types

  • Configurational bias: MV vs. RR

  • Thematic fit (initial NP to verb’s roles)

    • i.e., Plaus(verb,noun), ranging from

  • Bias of verb: simple past vs. past participle

    • i.e., P(past | verb)*

  • Support of by

    • i.e., P(MV | <verb,by>) [not conditioned on specific verb]

  • That these factors can affect processing in the MV/RR ambiguity is motivated by a variety of previous studies (MacDonald et al. 1993, Burgess et al. 1993, Trueswell et al. 1994 (c.f. Ferreira & Clifton 1986), Trueswell 1996)

*technically not calculated this way, but this would be the rational reconstruction